March 3, 2026Open Access

Putting the friction back in:Minimal computing approaches to corpus construction

Key Points

Corpus construction involves critical decisions that impact subsequent computational text analysis, emphasizing practical skills.
The framework proposed encourages students to reflect on resources and labor essential for building textual corpora.
By utilizing tools like Google Ngram Viewer, students grasp the limitations of digitization and the importance of accurate metadata.
This approach not only fosters collaboration but also lays the groundwork for developing critical AI literacy in the digital humanities.

Abstract

This article sets out ways that corpus literacy can be taught in the digital humanities classroom to illuminate for students the practical steps and curatorial decisions that go into constructing a corpus, and the implications of these decisions for the computational text analysis that follows. It proposes a framework that resonates with the principles of minimal computing while also leading students to interrogate the resources and the labor required for constructing textual corpora. It suggests critical readings that can be used in tandem with an exploration of the Google Ngram Viewer and the Google Books project whose data underlies it, as a way into understanding the limitations of Google's digitization project and the importance of reliable metadata and robust OCR (optical character recognition), as well as the historical contingency of projects claiming to widen access to information. It lays out ways to lead students through the practicality of building their own corpus, from undertaking OCR on their own devices to the cleaning and structuring steps that, undertaken collaboratively with others, bring awareness to concerns including file naming conventions, logical directory structures, accurate metadata, and version control, while also fostering the crucial digital humanities (DH) skill of being able to work collaboratively. This kind of corpus literacy is, I argue, not only compatible with a minimal computing approach but one of the starting points from which a broader program of critical AI literacy might begin.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Anouk; id_orcid 0000-0001-9597-1026 Lang (Fri,) studied this question.

www.synapsesocial.com/papers/69a7616dc6e9836116a2f5ea

Authors

Anouk; id_orcid 0000-0001-9597-1026 Lang

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Putting the friction back in:Minimal computing approaches to corpus construction

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion