UPDATED 14:00 EST / FEBRUARY 28 2012

What’s the Key to Simplifying Big Data? Objectivity Founder Tells All

Many of us have an idea of what big data means, but it’s the execution that seems to trip people up.  If you ask Objectivity, the company behind InfiniteGraph, one of the biggest challenges around big data isn’t the volume but the relationship between data sets.  And if you ask me, relational analysis is what truly contextualizes big data for today’s (and tomorrow’s) market.

In today’s Profile Snapshot we hear from Leon Guzenda, founder of Objectivity, who discusses the challenges in discussing big data as an emerging concept.  Guzenda also tackles the key to simplifying data, and shares his favorite way to get the weekend started.

What aspect of big data is the most difficult to explain?

Everybody understands the volume, variety and velocity (rate of arrival) problems, but very few of the current business intelligence systems can cope with relationships between the data item.

  1. Relational databases aren’t actually very good at dealing with complex relationships. They employ join tables and indices to allow queries to find related data by value, which is highly inefficient.
  2. Key-Value stores exploit hashing techniques to find individual pieces of data, but, once again, relationships are formed by storing sets of keys and it is generally impossible to store and retrieve graph structures efficiently.
  3. Document databases use hyperlinks to deal with the relatively few links within and between documents, so they don’t scale well with highly interconnected sets of data, such as email repositories.
  4. File-based systems, such as Hadoop, cope very well with data that is streamed or constantly scanned, but are very bad at handling the random I/O patterns that graph structures demand.
  5. The instrumentation and diagnostic tools for the newer “NoSQL” tools aren’t yet mature enough to meet the demands of high end production systems.

Object databases are built to handle masses of highly interconnected objects that may have many variants, which is whyInfiniteGraph is uniquely positioned in the Relationship Analytics world. It is built on Objectivity/DB, which, with roots in the telecom and process control equipment worlds, is built to be deployed in distributed environments and to require very little administration. Its genesis in engineering and Big Science applications makes it ideal for handling huge graphs.

What’s the key to simplifying complex data?

a) The object-oriented paradigm helps a lot. Inheritance helps simplify the task of dealing with data types that have multiple variants.

b) Having a unique and efficient universal object identifier makes it much easier to create complex structures, traverse them and find things very efficiently. It’s worth noting that InfiniteGraph and Objectivity/DB do not rely on a single hash-table to decode the object identifiers (OID) and the OID was designed to deal with distributed environments, ranging from geographic to clustered, grid and cloud-based systems.

What’s the most surprising/innovative app you’ve seen developed from your platform?

That’s a tough question. There have been many ground-breaking applications over the years, ranging from chip design systems to Big Science and Intelligence Community systems. The one that had most impact was probably the Iridium Low Earth Orbit satellite communications system. It was designed from scratch using object-oriented principles. It has been in operation for over a decade and has never failed.

InfiniteGraph is a brand new product, but there are already some very interesting applications. Many of them are in the cybersecurity and intelligence space, but the most novel so far will try to combine a livestock genealogy database with farming, veterinary and dietary information to make it possible to find links between diseases/conditions and environmental factors.

We should also give an honorable mention to the winner of our recent competition for novel uses of InfiniteGraph. “InfiniteCommits” was developed by William Cheung and allows users of GitHub to quickly obtain useful information that isn’t currently available through the GitHub web interface. This application uses Play (a rapid Java and Scala web development framework) to generate a report of the most active files in a GitHub repository, while using InfiniteGraph’s Data Visualizer to see the latest changes and notes associated with each change.

Contextualizing data is a promising way to plan for the unknown. What does “balancing the future” mean to you?

A few decades ago the IT world spent a lot of energy trying to put everything into grand, centralized databases and “eliminating the paper.” The WWW has helped enormously with the latter challenge, but, if anything, data complexity, variety and volumes have gotten out of hand. Some of the companies that we are working with are startups, so they’re trying to build flexible frameworks that will allow them to incorporate, fine tune or replace technologies as things change. The largest organizations need to focus on refactoring their current infrastructure using commodity components, including the cloud, while exploring new opportunities with techniques such as NoSQL.

One of our long-term eXtremely Large Database (XLDB) customers is currently standing up a proof of concept in the tens of petabytes. It will act as a front-end processing factory for multiple legacy silos, allowing them to gradually move systems to a more modern environment. The keys, as always, are good planning, not relying on a single technology and taking small steps with multiple “off ramps.”


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU