5 Ways Big Data Analytics Caught J.K. Rowling in the Act : Pseudonyms Can’t Hide
By now you surely know that the Robert Galbraith, the first-time author of a new crime novel called The Cuckoo’s Calling, is not a Robert, or a first-time author. Robert Galbraith is none-other than J.K. Rowling, the superstar author of the Harry Potter series. Unraveled by the UK’s Sunday Times, Richard Brooks, the paper’s arts editor, received an anonymous tweet claiming Robert Galbraith was in truth J.K. Rowling. He then started on a mission fitting of a private detective (a lot like the one in The Cuckoo’s Calling) to prove this anonymous tweet.
Lo and behold, Big Data analytics cracked Rowling’s coded secret. Mr. Brooks enlisted the help of two forensic linguistic computer scientists to see if there were any similarities between “The Cuckoo’s Calling,” “The Casual Vacancy” and the last Harry Potter novel, “Harry Potter and the Deathly Hallows.” This is when the story gets about as juicy as juicy gets for a geek.
Pieces of you in texts, tweets and updates
Think that blog post your wrote, or that tweet you sent three years ago is insignificant? With the power of computers that are able to compute sophisticated statistical analyses, researchers are mining all sorts of famous texts for clues about their authors. That isn’t the only thing researchers are mining.
*Newsflash* consumers: They’re are also mining not-so-famous texts too. Blogs, tweets, Facebook updates, chat forums and even Amazon reviews for clues about people’s lifestyles and buying habits. Whether or not you realize it, you choose words deliberately, to convey specific messages. And even the best attempts to hid who is conveying the message, little bits and pieces of personal information are leaked with each new message.
From a National Geographic story about the computer scientist consulted to prove it was J.K. Rowling,
“There’s a kind of fascination with the thought that a computer sleuth can discover things that are hidden there in the text. Things about the style of the writing that the reader can’t detect and the author can’t do anything about, a kind of signature or DNA or fingerprint of the way they write,” says Peter Millican of Oxford University, one of the experts consulted by the Sunday Times.
The other expert contacted by Brooks was Patrick Juola, who ran the Rowling books through a computer program he and his students have been crafting for 10 years, called JGAAP. Collectively these data sleuths were able to tell the Sunday Times that they were pretty certain The Cuckoo’s Calling’s true author was indeed J.K. Rowling.
Here’s a short list of the analytics methods and clues that outed Rowling’s secret.
5 ways Big Data can get you caught up
- Comparing all of the word pairings, or sets of adjacent words, in each book.
- Tests that searched for “character n-grams”, or sequences of adjacent characters.
- Tallied the 100 most common words in each book and compared the small differences in frequency.
- Testing completely separates a word from its meaning, by sorting words simply by their length.
- Principal Component Analysis: compare all of the books on six features: word length, sentence length, paragraph length, letter frequency, punctuation frequency, and word usage.
In 5 hours, computer scientists were able to utilize forensic linguistics on Big Data and prove well enough to put their word behind it that Robert Galbraith was none-other than J.K. Rowling.
Try as you may, you can’t hide from Big Data.
[Editor’s Note: Feature image photo credit Sarah_Ackerman. -mrh]
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU