Quantifying literature
September 3, 2010 |  by Dale Keiger

Jesse Rosenthal began his scholarly career by earning a bachelor’s degree from Swarthmore College in English, no surprise for someone recently added to the Krieger School’s English faculty as an assistant professor. But his minor was mathematics, and he programmed computers for fun. So the idea of applying computers and quantitative analysis to the study of literature held some fascination for him. His doctoral dissertation at Columbia University, “Moral Sensibilities: Ethical Feeling and Narrative Form in the Victorian Novel,” concerned how Victorians felt about what happened to them when they were carried by a novel’s narrative to what they believed would be an ethically or morally better state of affairs. The scholarship was done in the customary way: Rosenthal alone with a stack of books, reading, making notes, thinking, reading, making more notes. But when he needed some distraction, there was always the computer and his lingering interest in mathematics. He says, “I got interested in quantitative analysis as a way to blow off steam while working on the dissertation.”

He has continued to think about how he might apply computers to supplement his standard, scholarly close reading. For example, “Something that I’d like to do would be to really get a firmer handle on the styles of the novelists I deal with and the genres that I teach,” he says. “Let me give you a kind of silly example. I was playing around with scatter graphs”—data points plotted on the axes of two variables—“of Dickens’ use of words over his career. I was just taking words and sticking them in various statistical black boxes.” One thing he played with was the author’s use of “that” and “which.” He says, “In poring over these graphs, if I took all of his books and [using the computer] divided the ‘that’s’ by the ‘which’s’ and looked for a pattern, there was a straight line—you could tell when a novel of his was written, almost down to the year. At a very simple level, I would love to know these kinds of facts about the novelists I read all the time.

“There’s a huge amount to know about these low-level things that are true but hard to put your finger on as a reader. [With data analysis] you come up with results that sometimes back up what you know, and sometimes show things you don’t know that leave you scratching your head, saying, ‘Why is this so significant and consistent across this body of work?’” Rosenthal cites work by critic Stephen Ramsay that graphed how characters move from location to location in Shakespeare’s dramas. “It’s almost impossible to notice, while you’re reading, the pattern that the computer picks up, but it can, with near perfect precision, separate the comedies from the tragedies. It has a hard time separating the tragedies from the romances.” Conventional critics have long noted the latter, which is a successful test of the new methodology.

Data analysis can deal with problems caused by an overwhelming volume of material. During the Victorian era, presses churned out a seemingly endless stream of novels, most of them awful but, nevertheless, the literary background to the classic works. “There’s this huge amount of text and you find yourself studying the little bits that float to the top while just gesturing at the larger body that it’s part of,” Rosenthal says. Among the novels he considered in his dissertation was Dickens’ Oliver Twist. “Though I have a sense and can point out in traditional ways how Oliver Twist does things differently,” he says, “I couldn’t actually show any real evidence for it being different from the larger body of the genre.” He doesn’t have enough hours in his lifetime to analyze more than a tiny sample of what he calls “the inert piles of text just sitting there,” but computers do, and there is an ever-expanding electronic library of digitized texts that, like any other data set, can be mined by computer for meaningful patterns, associations, and defining features.

Rosenthal does not expect this sort of analysis to supplant his standard way of working, but he’s curious to see what might be possible. “I have a computer and I can program it, so let’s see what I can do.”