Description and Research Questions
As of August 2022, the list of books known to have been in Galileo’s library is over 700. Many of these have been digitized as part of the Google Books project, and through visits to other libraries and scanning requests, I have copies of more than 400 of these books. Even though Google Books deploys its Optical Character Recognition (OCR) algorithm on its scans for full text search, the outcome is seldom immediately useful for text analysis.
In order to build an intentional corpora, one that bypasses 19th-century collecting practices and one that reflects known 17th-century collecting practices, I have been working with students and contractors to create a reliable full-text corpus of the books in Galileo’s library. Priority has been given to Galileo’s works, his opponents’ works, poetry (known influences and minor works), drama, and possible influential texts across form and genre.
Principles for creating the texts have been to prioritize sequencing of words. Words broken across lines are represented without the hyphen. Printers’ catch words at the bottom of a page, typed editorial marginalia, headers, and illustrations have been omitted. This step is temporary. As someone who relishes these details, this decision was made for the sake of the varying skills of the team members and to prioritize a proof of concept. Paratexts and page numbers have been retained. Punctuation, accents, and capitalization has been maintained. Paragraphs have been retained for prose, line breaks for poetry and drama. The u/v interchangeability in early modern printing has been accounted for by code for preparing the text for analysis.
A separate subcorpus has been created for just the prefatory letters of these books.
A link to the metadata file is available in relevant publications.
- Did Galileo Galilei’s prose sound archaic, innovative, poetic, or dramatic to contemporary readers? (Answered in part by my article “Contextualizing Galileo’s Verbal Battles via Stylometry.”)
- To what extent were Galileo and his opponents implicitly citing other sources? (Developing alongside the Seicento Lexicon project.)
Sources for Book Images
- Biblioteca dell’Archiginnasio di Bologna
- Biblioteca Nazionale Centrale di Firenze
- Biblioteca Riccardiana
- Bibliothèque nationale de France
- Google Books
- Library of Congress
- Linda Hall Library
- National Library of Medicine
- Warburg Library