Resources from August DH Pedagogy Workshop

Rotunda Press Digital Archives

Thanks to Patricia Searl for the following:

Data Services

UVA Library Research Data Services -- Data Discovery and Acquisition:

Thanks to Christine Ruotolo for the following: 

Collections of Datasets

DH Toychest (Alan Liu):

Includes demo corpora, which are “sample or toy collections of texts that are ready-to-go for demonstration purposes or hands-on tutorials--e.g., for teaching text analysis, topic modeling, etc.”

Journal of Open Humanities Data

“features peer reviewed publications describing humanities data or techniques with high potential for reuse”

Kaggle Datasets

Eclectic collection of hundreds of datasets in many fields

Digital Text Collections


Featured data sets:

Eighteenth Century Collections Online: (in TEI) (in plain text)

Over 2,000 texts made available by the ECCO Text Creation Partnership

Internet Archive

Modern English Collection: Public domain texts digitized by the UVA Library

(link to download will be posted soon; titles can be browsed here)

Twitter Datasets

Documenting the Now:

(datasets of tweet IDs that can be rehydrated back into full tweets)

Museum Datasets

Data Repositories

Dataverse: (UVA’s Dataverse) (Harvard’s Dataverse; includes data from many institutions)

Share, publish, and archive your data. Find and cite data across all research fields

Tools for Cleaning Your Data