Description and Research Questions

Texts in this corpus range from Niccolò de’ Rossi’s short collection of poetry (431 words) to Ramusio’s compendious reports of sea travels (2 million words). The digital texts are drawn almost entirely from the Biblioteca Italiana projects at the Sapienza in Rome. At the time I was creating the corpus, the site had been under maintenance for an undetermined set of months, so I relied on the Wayback Machine. Advanced Search features did not include period at the time, so I researched authors alphabetically. Not all authors from the period had an associated plain text or xml file. An additional 6 texts were found in Early English Books Online, even though the works are in Italian. Of the 437 files, 388 have been created from modern editions of Italian late Renaissance and early modern books. There are only 49 diplomatic editions of printings 1500-1675, still ~2.9 million words. The primary challenge of these early editions is that some authors are over-represented (Ludovico Ariosto, Giovanni Botero, Giraldi Cinzio, Alessandro Piccolomini, and Petruccio Ubaldini). Descriptive metadata can be found here. The collection at La Sapienza has grown in the intervening years and remains a valuable contribution to, as they say, of representative texts of different periods in the history of Italian literature.

Working with this corpus brought to light several challenges and questions that motivate current projects and the reason that I ultimately set aside this corpus:

Digital Humanities Questions

  • Should this corpus be treated as two subcorpora based on presence of modern editorial interventions?
  • How do the canonical, over-represented authors skew the results of computational text analysis such as stylometry and topic modeling?
  • Can existing tools account for the fluidity of influence and inspiration across forms (drama, lyric poetry, letters, treatises, long-form prose fiction, epic poetry) in the period?

Italian Studies Questions

  • Did Galileo Galilei’s prose sound archaic, innovative, poetic, or dramatic to contemporary readers? (Answered in part by my article “Contextualizing Galileo’s Verbal Battles via Stylometry.”)
  • To what extent do any results describe the Sapienza’s digital collection priorities more so than trends in early modern literature overall? This question is why I have turned away from using this data set (in spite of the time invested to create it).