Search DH@UVA

Digital Humanities at the University of Virginia


Paul Vierthaler Vierthaler of the University of Leiden specializes in the digital humanities and Ming and Qing dynasty Chinese literature. 

Paul Vierthaler Reports New Method of Text Analytics in Humanities Informatics/DH Lecture

Innovative Stylometrics Method Extends to Corpora in Any Language

By Christian Howard

In his talk entitled "Where Did All These Rumors Come From? Computationally Identifying Intertextuality and Machine-Classifying Its Source in a Late Imperial Chinese Corpus,” Paul Vierthaler employed a new approach to the well-established DH technique of stylometry, suggesting a way to identify where certain texts have supplied phrases that are inherited or borrowed in other texts.

To trace how the politically powerful Chinese eunuch Wei Zhongxian (1568-1627) is described across a large set of historical and fictional texts, Vierthaler used a form of stylometry that he calls "automated intertextuality detection."  Hypothesizing that "a quote should look more like the text it came from than the rest of the text into which it is inserted," Vierthaler then applied intertextuality detection across a corpus of texts sharing descriptions of Wei Zhongzian, finding that the machine correctly guessed which was the origin quote and which was the borrowed quote in 18 out of 19 shared quotes.

Vierthaler bases automated intertextuality detection on BLAST, a tool more commonly used in bioinformatics.  When applied to Chinese texts, he is able to establish that two texts are either quoting from one another or quoting from the same source if the quote is at least ten characters long and at least 90% the same.  While string-length requirements vary for different languages, his technique can be used to identify textual reuse across almost any text corpora.  

Paul Vierthaler's talk took place on Thursday, October 11, at the University of Virginia. On Oct. 12, Vierthaler led a workshop in which he demonstrated how participants could apply his stylometric method to their own textual corpora. Vierthaler's talk and workshop were sponsored by the Network/Corpus group of the Humanities Informatics Lab; the Institute of the Humanities and Global Cultures; Ron Hutchins, Vice President for IT; John Unsworth, Dean of the Library; Brie Gertler, Interim Associate Dean of Arts and Humanities in the Graduate School of Arts & Sciences; and Archie Holmes, Vice Provost for Academic Affairs. 

Paul Vierthaler joined the Leiden Institute for Area Studies and Leiden University Centre for Linguistics in Fall 2016 to help found the Leiden University Centre for Digital Humanities, and he is a University Lecturer (Assistant Professor) of the Digital Humanities at Leiden University in the Netherlands. He earned his PhD in East Asian Languages and Literatures from Yale University in 2014. In his current monograph project, he analyzes how historical events are represented in “quasi-histories" written in late imperial China.  More about his research can be found here.