“N + 1: A Plea for Cross-Domain Data in the Digital Humanities.” Keynote Panel on “Data, Corpora, and Stewardship,” Digital Humanities at Berkeley Summer Institute, University of California, Berkeley, 17 August 2015.
Abstract: In experimenting with text analysis, machine learning, visualization, and other methods, digital humanists often study materials collected from specific segments of the human documentary record–for example: a study corpus consisting just of one of the following at a time: novels, poems, letters, newspapers, historical maps, crime records, political speeches, etc. Such corpora also tend to be tuned to the specific domain of a scholar’s expertise (e.g., novels of a particular century and nation). In this short, speculative talk, Liu asks: what could be gained methodologically and theoretically by deliberately hybridizing domains–for example, pairing any two or three kinds, periods, or nationalities of materials in a controlled way? What would be involved, in other words, in giving digital humanities corpora some of the mixed quality of their uncanny doubles (alike yet dissimilar): “archives” in the strict sense and “corpora” in the corpus linguistics sense?
The talk concludes with a presentation of aspects of the 4Humanities.org “WhatEvery1Says” research project (topic modeling public discourse about the humanities) that bear on the theme of cross-domain knowledge.