In just over two years after launching its fantastic semantic Open Calais platform, Thomson Reuters is seeing great adoption of the commercial side of the service. Starting out by exposing Open Calais as a free web service for anyone to use, and focusing on building out a community around the platform, Thomson Reuters has also been signing up large customers for its commercial service left and right. Pretty much if you have any amount of unstructured data you want to make sense of, its the service for you. I caught up with project lead Thomas Tague at the SemTech 2010 conference to here about the latest wins they are announcing, as well as talk shop about the infrastructures them, and their customers are putting to use to mine the vast amounts of data passing through the system.
The service, which now processes more then 5 million documents per day, and stores over 90 billion triples (semantic data structures), is now being utilized by companies for a wide range of use cases beyond standard web publishers looking for auto-tagged content. Magus Ltd, a leading UK web governance solution provider, is using the service to bring reasoning to droves of unstructured content for customers such as Unilever, Shell, and ING.
In the spirit of the World Cup happening, they are also announcing that Prefix Technologies, South Africa’s largest content management company, is using Open Calais to efficiently scan and link related content across 10 years of archives.
In catching up with Thomas, it was apparent that the service has become so successful, and has so much data flowing through it, they must continue to ratchet up their services running in the background that can find deeper meaning in the unstructured data. He talked about how they are using a wide range of infrastructure setups for its main SaaS service, and when appropriate data calls for it, sometimes leverage AWS to run some experimental jobs to test out some new possibilities.
In running a SaaS service, as well as selling the package in an On-Premise manner, he pointed out that most of the demands of their larger customers dictate that things be run under scrutinized infrastructure that does not lend itself to public cloud providers. This of course screams for a private cloud solution, but as he pointed out, its still in the early adoption days, and companies are still figuring out how to move forward into production scenarios with their internal cloud efforts.
The service is not just for large content producers. Those of you interested in, or running the popular Drupal publishing platform, can also check out the Drupal plugin to interface to Open Calais, or check out the newly launched Open Publish project, which brings a publishing industry focused vertical of Drupal.