UPDATED 13:27 EDT / OCTOBER 10 2012

Cold, Old & Bold – Real World Examples on Wide Range Use of Big Data

While we mostly agree Big Data is that which constitutes some combination of the 4 V’s – it does have other attributes, which are generally driven by the specific industry and overall business use-case. According to Gartner Group’s Big Data Hype Cycle report issued in July: “Although Big Data is not necessarily just about MapReduce and Hadoop, many organizations consider these distributed processing technologies to be the only relevant ‘big data technology,’ there are alternatives.”

While Hadoop as a platform is taking off, not all enterprises are yet fully enabled to adopt Hadoop in a serious way. Many of the Hadoop production environments don’t actually have multi-structured data types (the Variety V) for which it was originally architected to manage. While this may change over time, I do also believe that the structured and semi-structured transactions are the heartbeat of most organizations today and therefore the source of the richest enterprise information.

Many organizations have gotten their Hadoop feet wet with use-cases, such as pre-processing the data or staging it before loading to a traditional data warehouse or what I previously coined “Raw is More.”  Enterprise Hadoop adoption does tend to center around the business challenge of “we’ve got many new sources of data that we need to quickly capture it, retain it in raw form, query and integrate with more traditional data sets to derive new business value.” Very quickly Hadoop requirements to solve this problem become focused on uptime, reliability, security, and overall performance or real-time access. In fact, Hortonworks’ recent open source product announcement focused on exactly that.

Big data in the real world

We know Big Data use-cases come in numerous forms and probably the biggest challenge with the 4 V’s definition is that it focuses more on describing attributes of the data itself and less about what you are trying to do with it. Quickly rising to the top of the list is overall query performance and moving away from the batch approach that is MapReduce. Most analytical environments aim for faster analytics performance and predictive analytics, while now positioned in the plateau of productivity in Gartner’s Big Data Hype Cycle, is something that most have strived toward for many years.

Let’s examine a few real-world examples to shed light on the wide range of Big Data use-cases and how business needs vary in terms of data “consumption” at different stages or ages of data.

Retail predictions

If we look at a large online retailer that needs to track customer behavior in order to determine new target segments for a new product offering, they will likely track current behavior by examining web click-stream data, which products purchased over perhaps the last 2 Q’s, average sales prices and include social or sentiment data sources from Twitter, Facebook or emails which provide a good measure for the marketing organization to determine market fit.

By contrast the same retailer might also want to understand seasonal spikes, and if they are about to issue a new product during the holiday season, they may want to find out what happened when Christmas Eve falls on a Saturday. With this type of analysis, they need to go back seven years (the last time Christmas Eve fell on Saturday) to determine exact buying patterns. However, the current data warehouse may not have data that old, and so we have a problem.

If it’s stored on offline tape, good luck retrieving it quickly so now you need to bake-in additional weeks to retrieve, reinstate and analyze it. The historical data in this use-case is really important for accurate decision-making. It’s got nothing to do with what’s happening in the business now, last week or last quarter.

Banking and the law

Let’s take a banking example where you have a specific SEC requirement to retain all customer transactions for five years after the account is inactive. This data is required by law to be available for query and audit, so storing it in an environment with current customer data will not only impact storage capacity and hardware but will likely impact overall analytical performance.

Performing record-level deletion of those 5-year-old inactive transactions is a unique banking requirement and is quite different to the Big Data problem of storing all transactions for current analytics and KPI’s. In fact, storing inactive banking customer data doesn’t even affect the banks ability to drive revenue or margin– it’s an ongoing operational cost that simply cannot be avoided. There are many more big data use-cases that focus on how organizations handle their older, colder data for greater business benefit.

Banking, financial services, utilities and communications don’t have as much freedom as they are heavily regulated to keep historical data online for specific timeframes, varying by region. Of course there are use-cases in financial services where older data is actually a very necessary asset for richer decision-making such as high frequency trading environments where quants need access to years of historical data sets in order to build better algorithms.

A similar use of older, colder Big Data is a research company who needs historical data to conduct richer research outcomes, for say, a genome sequencing project, all spurring highly lucrative medical and commercial developments.

Teaching old data new tricks

There is no question that figuring out the best technology approach to storing data going back five to 10 or even 20 plus years is a painful undertaking for IT. All too often I meet with large banks struggling to retain hundreds of terabytes of older customer data in their central data warehouse which becomes very costly even at 10-15k per terabyte (depending on the chosen solution). This is exactly where Hadoop becomes the ideal platform for storing the older, colder data and where scale simply happens. I am looking forward to the day when enterprises employ a well thought out strategy for how they approach information management with the best-fit technologies for business purpose and value.

According to Gartner’s Big Data Hype Cycle report: “Organizations with information management practices in place should not try to apply them as-is to these new information types, as they will likely cause existing governance structures and quality programs to collapse under the additional weight of the information being processed.” And then goes on to state: “Gartner believes that through 2015 organizations integrating high-value, diverse, new information types and sources into a coherent information management infrastructure will outperform their industry peers financially by more than 20 percent.”

If you want to be bold in your business, employ the best information management technology so business users can continue to access your cold, old data for deeper insights.

About the Author
John Bantleman, CEO, RainStor


John Bantleman has more than 20 years’ experience in the management of software companies. Prior to overseeing RainStor, John transformed LBMS into a $45 million business prior to its successful NASDAQ flotation in 1997. Today’s LBMS’ technology is now part of CA’s product portfolio.

The following year John was instrumental in the launch of Evolve, and drove the company through to a successful IPO on NASDAQ.
Returning to the UK in 2003, John spent 12 months working on the advisory boards of venture capital organizations such as Apax Partners. He joined RainStor Inc. as Chairman in 2004 and became CEO at the start of 2007 and relocated back to the US to head-up worldwide operations in 2009.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU