In the last few years, literary studies have experienced what we could call the rise of quantitative evidence. This had happened before of course, without producing lasting effects, but this time it’s probably going to be different, because this time we have digital databases, and automated data retrieval. As Michel’s and Lieberman’s recent article on "Culturomics" made clear, the width of the corpus and the speed of the search have increased beyond all expectations: today, we can replicate in a few minutes investigations that took a giant like Leo Spitzer months and years of work. When it comes to phenomena of language and style, we can do things that previous generations could only dream of.
When it comes to language and style. But if you work on novels or plays, style is only part of the picture. What about plot – how can that be quantified? This paper is the beginning of an answer, and the beginning of the beginning is network theory. This is a theory that studies connections within large groups of objects: the objects can be just about anything – banks, neurons, film actors, research papers, friends... – and are usually called nodes or vertices; their connections are usually called edges; and the analysis of how vertices are linked by edges has revealed many unexpected features of large systems, the most famous one being the so-called "small-world" property, or "six degrees of separation": the uncanny rapidity with which one can reach any vertex in the network from any other vertex. The theory proper requires a level of mathematical intelligence which I unfortunately lack; and it typically uses vast quantities of data which will also be missing from my paper. But this is only the first in a series of studies we’re doing at the Stanford Literary Lab; and then, even at this early stage, a few things emerge.