Associated Press Taps Big Data to Monetize News Content


Whether for compliance reasons or simply because they didn’t know what else to do with it, enterprises have been storing large volumes of data for ages. With the advent of Big Data, those large volumes have grown to gargantuan volumes thanks to social media data, machine-generated data and unstructured content.

The promise of Big Data approaches like Hadoop and No SQL database technologies is the ability for enterprise to access, analyze and, most importantly, monetize all that data that’s just been sitting around for all those years. Media companies in particular are in a position to capitalize on Big Data to monetize years worth of unstructured content, namely news articles.

The Associated Press is one such media company. Its core business, of course, is covering and distributing daily news stories to outlets across the globe. Not surprisingly, the AP has quite a collection of news content going back decades and companies of all types are interested in getting their hands on it. To that end, the AP packages and sells its content to enterprise customers that use it to perform historical market research and develop competitive intelligence.

Until recently, the AP stored all its content in a traditional relational database, but searching for unstructured content force fed into a relational database proved ineffective, to say the least. The company recently ditched the relational approach for a document-oriented database from MarkLogic. MarkLogic Server uses a schema-less document-oriented data model on top of a shared-nothing, scale-out cluster architecture.

While not new (the company has been around since 2003), MarkLogic shares a number of characteristic with the new breed of NoSQL databases, whose raison d’etre is to efficiently store and provide access to multi-structured data. Not all NoSQL databases are designed for the same purpose, however, with MarkLogic’s specialty being storing and providing search and discovery capabilities against large volumes of documents. In other words, its tailor-made for media companies looking to monetize their news content.

The AP has built a content analysis application on top of MarkLogic that runs complex Boulean queries against hundreds of millions of news articles going back to the 1970s. This allows the AP to quickly pull together targeted content for its enterprise customers. The company is looking to add real-time capabilities to update the application when new, relevant stories are filed, according to David Gorbet, Vice President of Product Strategy at MarkLogic.


The AP’s use of Big Data to monetize existing assets illustrates an important use of Big Data. While not their core business, packaging and selling existing content has proven a successful revenue generator for the AP. Likewise, all enterprises should consider how Big Data could enable the development of new services that leverage existing Big Data assets to drive new business.