August 2015

“N + 1: A Plea for Cross-Domain Data in the Digital Humanities.” Keynote Panel on “Data, Corpora, and Stewardship,” Digital Humanities at Berkeley Summer Institute, University of California, Berkeley, 17 August 2015.

  • Abstract: In experimenting with text analysis, machine learning, visualization, and other methods, digital humanists often study materials collected from specific segments of the human documentary record–for example: a study corpus consisting just of one of the following at a time: novels, poems, letters, newspapers, historical maps, crime records, political speeches, etc. Such corpora also tend to be tuned to the specific domain of a scholar’s expertise (e.g., novels of a particular century and nation). In this short, speculative talk, Liu asks: what could be gained methodologically and theoretically by deliberately hybridizing domains–for example, pairing any two or three kinds, periods, or nationalities of materials in a controlled way? What would be involved, in other words, in giving digital humanities corpora some of the mixed quality of their uncanny doubles (alike yet dissimilar): “archives” in the strict sense and “corpora” in the corpus linguistics sense?
            The talk concludes with a presentation of aspects of the “WhatEvery1Says” research project (topic modeling public discourse about the humanities) that bear on the theme of cross-domain knowledge.

Citation: Research Report: ”How Public Media in the U.S. and U.K. Compare in Their Terminology For the Humanities.” WhatEvery1Says Project, (3 August 2015)


While assembling a study corpus of public discourse in English about the humanities (since about 1990 when newspapers began fully digitizing articles), the 4Humanities “WhatEvery1Says” Project (WE1S) encountered the following questions of linguistic usage:

  • How are the humanities referred to in newspapers, magazines, and other media in the U.S. compared to the U.K. (and other Commonwealth nations)? Especially, what from a comparative perspective is the overlap/difference between the terms “humanities,” “liberal arts,” “arts,” and “the arts”?
  • Do the proportions of such terms change over time in each nation?
  • Most practically, which terms (“humanities,” “liberal arts,” “arts,” and “the arts”) should the WE1S project use for searches in newspaper API’s and other resources as it locates texts for its corpus? (Since public discourse in newspapers, magazines, and other media is too ample to be collected in toto, WE1S aims to collect just what might be called the “neighborhood” of discussion of the humanities. The project will then apply text analysis methodology to this neighborhood to refine its understanding of the way the humanities are discussed.)

The following is a preliminary study focused on comparing linguistic usage in the U.S. and U.K.  It is conducted by Alan Liu with assistance from other members of the WE1S research team and the co-leaders of The study will be extended and revised as WE1S research continues.

«Go to full blog post»