Flash Fellowship: Computational Citation Categorization

It’s no secret that research is rhetorical. In the most rhetorical terms, research practices build on a discipline’s assumptions and warrants to gather evidence and to make claims based on that evidence. For most research, evidence appears in the form of citations. But are all citations created equal? Citations of method foreground the testability of an argument-that rhetoric can and should be tested. Citations of results, however, foreground rhetoric as untestable in that such citations trust that prior conclusions or outcomes are given.

The process of categorizing citations by type has the potential to contribute to two key dimensions of research: replicability and scalability. Replicability is an essential part of the construction of facts. Toward the goal of constructing facts, responsible researchers should replicate studies to test hypotheses independently of the results of previous studies. Although facts, like fictions, are constructed, facts differentiate themselves from fictions in that they can be tested and do not depend on isolated findings to bolster their claims. Accordingly, this flash fellowship project takes a computational approach to categorizing citations based on the information included in each citation (the stuff between the quotation marks).

In my pilot study, I found four basic citation types and noted the linguistic patterns that typify them. Citations of method tended to include the word “use” and its various forms and conjugations more frequently than the other categories, for instance, while citations of results more frequently included words such as “find”, “conclude”, “suggest”, “imply”, and “interpret” and their various forms. In terms of my flash fellowship project, these patterns will serve as fruitful media for the creation of a computer program that is able to parse the four different types of citations in a similar way and scale them to larger data sets.

Leave a Comment