Addressing the Problems of Digitizing Latin Incunables

A project funded by the The National Endowment for the Humanities Division of Preservation and Access to study the problems in digitizing books printed in Latin before 1500 (incunables) confronted both by those who need access to the texts and by historians of the book.

In this project, they will create model digital facsimiles, transcriptions, and character sets for nine representative early printed books. They will also establish protocols and electronic tools for transcribing these works, deciphering abbreviations, and editing the texts. The project will produce an online dictionary of 15th-century Latin and look-up tools to help readers find variant spellings. These digital facsimiles, protocols and tools will be made freely available on the internet to anyone interested in digitizing or reading medieval and early modern printed books at http://daedalus.umkc.edu/incunables/index.html.

Project Description: Early books printed in Latin are a major component of our early modern cultural heritage. Before 1600, considerably more than half of the books printed in England alone were printed in Latin, as were the majority of books traded at international book fairs and marketed internationally. The ability to create digital editions of these texts is, therefore, essential for preserving the greater portion of the intellectual heritage of the early modern period. Digitization of these books, however, poses unique and difficult problems: characters and ligatures are printed using graphs not represented in ASCII or Unicode, figures and pictures that are essential for understanding a passage are embedded within the texts, words that carry from one line to the next may not be hyphenated, and common words and letter combinations are abbreviated with a system of brevigraphs based on medieval handwritten manuscripts. In recent years, projects such as the Making of America have developed techniques for rapid and cost-effective digitization of large corpora of printed works from the nineteenth and twentieth centuries, while Early English Books Online and the Text Creation Partnership have addressed problems of early books printed in English. At the same time, efforts such as the Newton Project and the Digital Scriptorium have developed extensive knowledge about best practices for transcribing and cataloging manuscript material. The unique problems of Latin incunables, however, still remain to be addressed. Our project will have the following specific deliverables:

Digital facsimile editions of a collection of incunabula containing texts by Al-Qabisi, Bernard of Gordon, Diogenes Laertius, Sebastian Brant, Isidore of Seville, Petrarch, Pliny the Elder, Suetonius, and Jacobus de Voragine. These facsimile editions represent a wide variety of scientific and literary Latin from many time periods and geographic locations, all published in the first fifty years of printing. They will serve both as testbeds for developing our tools and also as demonstrations of the results that we will be able to achieve.
Tools that can automatically or semi-automatically address the typographical difficulties posed by early printed works including:
1. Tools for the automatic identification of abbreviations and broken words;
2. Integration of these tools with a text editor allowing for interactive editing and disambiguation of uncertain abbreviations;
3. A digitized edition of a dictionary essential for reading fifteenth-century Latin based on Du Cange's standard medieval Latin dictionary;
4. Extremely flexible look-up tools for dealing with the wide variety of orthographic variation in early printed Latin texts.
Guidelines for data entry and encoding brevigraphs in early printed Latin texts that can be shared with others digitizing similar material.

Results of our work will be disseminated in several ways.

First, we will publish high resolution images of our early printed books alongside digital transcriptions on the web using the software infrastructure developed by the Perseus project (http://www.perseus.tufts.edu/).
Second, we will return our TEI-conformant XML transcriptions of the texts to the libraries that provide us the images of the books so that they can be disseminated via their own web sites as well as through our e-publication infrastructure.
Third, we will release all tools for public use both via the internet and as stand-alone applications so scholars in rare books rooms without easy internet access will be able to use them.
Fourth, the source code for our tools will be made available to any interested researcher under a Creative Commons license, allowing other scholars to adapt and extend them for their own work.
Finally, we will conduct a detailed analysis of both our workflow and the ways that users interact with our digital texts so that we can provide a clear understanding of the technical requirements for building large digital collections of rare and complicated books. Our ultimate goal will be to produce documentation that details data entry methods and encoding standards for early printed Latin works so that librarians and scholars can digitize their own early printed holdings.