March 2020


Citation:”Data Moves: Libraries and Data Science Workflows.” Libraries and Archives in the Digital Age. Ed. Susan Mizruchi. Cham: Palgrave Macmillan, 2020: 211-219.

  • Abstract: Library-based collections and repositories are today advancing well beyond accumulating resources in digital form for the purposes of searching, reading, and other primary access. New advances toward treating collections as “always already data” facilitate next-generation computational uses of digitized materials—for example, treating collections as datasets for advanced datamining analysis.
            In considering how library collections can serve as data for a variety of data ingestion, transformation, analysis, reproduction, presentation, and circulation purposes, it may be useful to compare examples of data workflows across disciplines to identify common data-analysis “moves” as well as points in the data trajectory that are especially in need of library support because they are for a variety of reasons brittle. Drawing on the precedent of so-called in silico science—which has had a ten-year start on developing methods and standards for tracking the provenance of data, annotating and visualizing data analysis workflows for reproducibility, and comparing data workflows in different fields—Liu argues that other disciplines such as the humanities and social sciences can exploit today’s library data collections in similar ways. The goal is twofold: first, open, shareable, and reproducible data scholarship, and second, higher or meta-level analysis of such scholarship. For example, might methods for comparing data workflows in the sciences (seeing, e.g., how astrophysics compares with medical science in using data) be extended across the disciplines to the digital humanities, digital arts, and digital social sciences? Beyond borrowing science data paradigms for other disciplines, Liu also thinks in the reverse direction. He draws on the twentieth-century tradition of literary and ethnographical analysis—for example, the idea of the narrative “motif” or “move” (in the Russian: mov)—to suggest that humanities and social science approaches to data workflows are just as crucial as scientific ones. After all, however one analyzes data (and in which field), one ultimately has to tell the story of that workflow and its results. That puts the problem squarely in the domain of narrative motifs and moves, which Liu argues can be matched to data workflow moves.

 

“Humans in the Loop: Humanities Hermeneutics and Machine Learning.” Keynote for DHd2020 (7th Annual Conference of the German Society for Digital Humanities), University of Paderborn, 6 March 2020.

  • Abstract: As indicated by the emergent research fields of computational “interpretability” and “explainability,” machine learning creates fundamental hermeneutical problems. One of the least understood aspects of machine learning is how humans learn from machine learning. How does an individual, team, organization, or society “read” computational “distant reading” when it is performed by complex algorithms on immense datasets? Can methods of interpretation familiar to the humanities (e.g., traditional or poststructuralist ways of relating the general and the specific, the abstract and the concrete, the structure and the event, or the same and the different) be applied to machine learning? Further, can such traditions be applied with the explicitness, standardization, and reproducibility needed to engage meaningfully with the different Spielräum – scope for “play” (as in the “play of a rope,” “wiggle room,” or machine-part “tolerance”) – of computation? If so, how might that change the hermeneutics of the humanities themselves?
    In his keynote lecture, Alan Liu uses the example of the formalized “interpretation protocol” for topic models he is developing for the Mellon Foundation funded WhatEvery1Says project (which is text-analyzing millions of newspaper articles mentioning the humanities) to reflect on how humanistic traditions of interpretation can contribute to machine learning. But he also suggests how machine learning changes humanistic interpretation through fresh ideas about wholes and parts, mimetic representation and probabilistic modeling, and similarity and difference (or identity and culture).
  • Video Video of lecture