Predicting Literacy Success: A Quantitative Exploration

Remember in high school when teachers told you to use more exciting verbs and adverbs because that’s what makes good writing? Why, they proclaimed, should a character just “say” something when he can “exclaim,” “cry” or “cheer” something?

Turns out your high school writing teacher might have been wrong about this one.

A new paper from Stony Brook Department of Computer Science has found a correlation between successful literature and writing style—and it doesn’t look good for exciting verbs and adverbs. Assistant Professor Yejin Choi, a co-author of the paper titled “Success with Style: Using Writing Style to Predict the Success of Novels,” examined 800 novels from eight different genres and found several predictors of literary success.

Less successful books contain a higher percentage of verbs, adverbs and foreign words (so maybe trying to sound sophisticated by peppering your stories with nods to French cuisine isn’t the best choice). Less successful books also use extreme descriptions, typical locations and “rely more on topical words,” like ‘love,’ that “could be almost cliché.” Verbs that explicitly describe actions or emotions—like “wanted,” “took,” or “promised,” appear more often in less successful books, while simpler verbs like “say” or “said” appear in more successful books. More successful books also make frequent use of adjectives and conjunctions such as “and,” “but,” and “or” to join sentences.

Choi and her colleagues defined “success” by download counts from Project Gutenberg, a donation-run website that offers over 42,000 titles for free download in electronic format.

The researchers took 1000 sentences from the beginning of each book. They performed systematic analyses based on lexical and syntactic features that have been proven effective in Natural Language Processing (NLP) tasks such as authorship attribution, genre detection, gender identification, and native language detection.

 “To the best of our knowledge, our work is the first that provides quantitative insights into the connection between the writing style and success of literacy works,” Choi said. “Our work examines a considerably larger collection—800 books—over multiple genres, providing insights into lexical, syntactic, and discourse patterns that characterize the writing styles commonly shared among the successful literature.” Their analytic system was able to predict, with 84% accuracy, which books were more successful.

Choi and her colleagues also made an unexpected discovery: readability and literary success “correlate in opposite directions.” “We conjecture that the conceptual complexity of highly successful literary work might require syntactic complexity that goes against readability,” Choi said.

Finally, my struggles reading The Classics are validated.