University of Galway

Data design and management of historical manuscripts

Dark blue image with light data nodes

The Challenge

How to turn information from ancient historical manuscripts from fragmented sources through the ages into data for academic research and public access

The Stemma Project's challenge was monumental, revolving around the digitisation and analytical exploration of early modern English manuscripts. The challenge was in the diversity and range of source information from across the world.  

These manuscripts, fragmented and preserved in various unstructured and legacy formats, represented a rich cultural heritage yet were inaccessible for comprehensive study. For example, we found 3291 different date formats in these data sets alone.

The task at hand was not just about converting these disparate pieces of history into a digital format but also about creating a structured, searchable, and analytically useful repository of information.  

This required innovative approaches to data ingestion, cleansing, and reconciliation, along with sophisticated database design to ensure both academic researchers and the public could explore these literary treasures effectively.  

A tunnel of data and colours

What we did

Cataloguing complex information to support Academic Research

We solved this intricate challenge of the Stemma Project by architecting an ingestion system that provided sophisticated and scalable algorithmic data cleansing processes.  

This initiative was aimed at deciphering the complexities of early modern English manuscripts, which were scattered across various formats and sources, posing a significant challenge in terms of digitisation and analysis.  

Through data ingestion, cleansing, and reconciliation processes, we transformed these fragmented, historically significant manuscripts into a cohesive, digitally accessible archive.  

The solution not only preserved the invaluable literary heritage but also made it readily available for academic research and public exploration. By implementing state-of-the-art database design and employing advanced data management techniques, we bridged the gap between historical manuscripts and modern digital accessibility.

Rows of data drivers
A woman standing with data & code projected over herself and the background with colours


A comprehensive and accessible digital archive

The Stemma Project achieved a transformative outcome by digitising and structuring fragmented, historical poetry manuscripts into a comprehensive digital archive.

This endeavour not only preserved a valuable segment of literary heritage but also made it accessible for scholarly research and public exploration. Through data processing and innovative database solutions, we overcame significant challenges related to data fragmentation and legacy formats, enabling Researchers to perform digital humanities research focused on early modern English poetry.

We’d like to thank the team from the University of Galway for their hard work and collaboration with us. Their expertise, dedication, and consistent commitment have been essential in order to reach our goals within the project.

Here’s what Established Professor of English Literature and Computational Humanities / PI of STEMMA, Erin McCarthy, from University of Galway had to say about working with our team...  

The Ember team has exceeded my expectations, not only because of their technical and project management expertise but because of their curiosity about the historical source material and the project’s bigger implications for those interested in it.

Lastly, we’d like to extend a huge thank you to the Irish Research Council and the European Research Council as the success of The Stemma Project would not have been possible without their support. We are incredibly grateful for their contributions and partnership throughout the project.

Work with us
Want to chat to someone from our team about a project?
Get in touch