Addressing the Problems of Digitizing Latin Incunables

A project funded by the The National Endowment for the Humanities Division of Preservation and Access to study the problems in digitizing books printed in Latin before 1500 (incunables) confronted both by those who need access to the texts and by historians of the book.

In this project, they will create model digital facsimiles, transcriptions, and character sets for nine representative early printed books. They will also establish protocols and electronic tools for transcribing these works, deciphering abbreviations, and editing the texts. The project will produce an online dictionary of 15th-century Latin and look-up tools to help readers find variant spellings. These digital facsimiles, protocols and tools will be made freely available on the internet to anyone interested in digitizing or reading medieval and early modern printed books at http://daedalus.umkc.edu/incunables/index.html.

Project Description: Early books printed in Latin are a major component of our early modern cultural heritage. Before 1600, considerably more than half of the books printed in England alone were printed in Latin, as were the majority of books traded at international book fairs and marketed internationally. The ability to create digital editions of these texts is, therefore, essential for preserving the greater portion of the intellectual heritage of the early modern period. Digitization of these books, however, poses unique and difficult problems: characters and ligatures are printed using graphs not represented in ASCII or Unicode, figures and pictures that are essential for understanding a passage are embedded within the texts, words that carry from one line to the next may not be hyphenated, and common words and letter combinations are abbreviated with a system of brevigraphs based on medieval handwritten manuscripts. In recent years, projects such as the Making of America have developed techniques for rapid and cost-effective digitization of large corpora of printed works from the nineteenth and twentieth centuries, while Early English Books Online and the Text Creation Partnership have addressed problems of early books printed in English. At the same time, efforts such as the Newton Project and the Digital Scriptorium have developed extensive knowledge about best practices for transcribing and cataloging manuscript material. The unique problems of Latin incunables, however, still remain to be addressed. Our project will have the following specific deliverables:

  1. Digital facsimile editions of a collection of incunabula containing texts by Al-Qabisi, Bernard of Gordon, Diogenes Laertius, Sebastian Brant, Isidore of Seville, Petrarch, Pliny the Elder, Suetonius, and Jacobus de Voragine. These facsimile editions represent a wide variety of scientific and literary Latin from many time periods and geographic locations, all published in the first fifty years of printing. They will serve both as testbeds for developing our tools and also as demonstrations of the results that we will be able to achieve.
  2. Tools that can automatically or semi-automatically address the typographical difficulties posed by early printed works including:
    1. Tools for the automatic identification of abbreviations and broken words;
    2. Integration of these tools with a text editor allowing for interactive editing and disambiguation of uncertain abbreviations;
    3. A digitized edition of a dictionary essential for reading fifteenth-century Latin based on Du Cange's standard medieval Latin dictionary;
    4. Extremely flexible look-up tools for dealing with the wide variety of orthographic variation in early printed Latin texts.
  3. Guidelines for data entry and encoding brevigraphs in early printed Latin texts that can be shared with others digitizing similar material.

Results of our work will be disseminated in several ways.