UPDATED 14:16 EDT / APRIL 15 2011

It’s Time to Bring Unstructured Content in from the Cold

For too long, unstructured content has played the role of neglected stepchild in the world of business intelligence (BI) and data analytics.

That’s unfortunate. Unstructured content makes up around 80% of all corporate data, and probably around 99.9% of all data when you include the web, as my colleague David Floyer put it, exaggerating just a bit.

Unstructured content – most notably text contained in emails, blog posts, Word documents, wikis and elsewhere — has been largely ignored because it’s difficult to process and analyze, not because of any inherent lack of value.

“This stuff is incredibly valuable and has deep information in it,” said Sid Probstein, chief technology officer at Attivio, a Newton, Mass.-based BI vendor. “It really is the essential missing element of analytics today.”

Take, for example, warranty information. A database full of structured data can tell you the number of warranty claims processed in a given time frame broken down by product. And that’s important information. But the actual warranty claims, in the form of PDF or Word documents, often include free-flow text explaining in detail why a customer filed the claim. Combining the two data sources for analysis could provide better insights into design flaws.

Most BI platforms are unable to process unstructured content, however, and that’s usually where the conversation ends. Those few users that want to put in the effort to dig into unstructured content often need to use a separate text analytics application or content management system. The result is two sets of analysis – one gleaned from structured data, one from unstructured content – that are difficult to correlate with one another.

But there are other options. Attivio is one of a handful of BI and data analytics vendors that have embraced the challenge posed by unstructured content, attempting to bring together both structured and unstructured data analysis in one platform. Attivio’s core platform, Active Intelligence Engine, melds traditional BI capabilities with enterprise search and text analytics to provide what the vendor calls unified information access, or UIA.

Endecca is another vendor in UIA market. The Cambridge, Mass.-based vendor’s Information Access Platform includes a database tuned to integrate and model semi-structured data. Exalead, based in Paris, is another player in the UIA market.

Call it what you will, the ability to bring unstructured content into your BI and data analytics environment will be key to staying ahead of the competition in the years to come. On a micro-level, this means using platforms like those mentioned above to build applications to bring mostly internal unstructured content together with traditional data warehousing environments. On a macro level, users should tap into Hadoop and other Big Data technologies to analyze the huge volume of unstructured social media data distributed throughout the web.

Each of the players in the UIA market takes a slightly different approach, so companies should evaluate the vendors closely to match the best platform to their data landscape.

To get a more in-depth look at unified information access, watch this video interview my Wikibon colleague Dave Vellante conducted with Attivio’s Probstein at the vendor’s Newton, MA headquarters.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU