UPDATED 10:08 EDT / MARCH 11 2015

Recognizing the layers of critical insight data offers

datavectorData is an interesting concept.  At a recent CrowdChat #RealDataStories,  a crowdsourced conversation which inspired this awesome blog post by Joseph George and Leo Leung, instantly created a dialog about the changing server based storage, big data, emerging technology and customer needs.  Then the topic developed into a forum on data and its inherent qualities.  This notion of Data Vector emerged.  Of course I started with Data Lake vs Data Ocean argument meaning that Data is not as simple as a “lake” which implies batch.  Data is an amazing opportunity and lever for value creation.

The discussion led us to realize that data actually has a number of attributes that define it – similar to how a vector has both a direction and magnitude.

Here are a few of the attributes that we uncovered as we delved into this notion of data as a vector: Data Gravity, Relative Location with Similar Data, and Time.

Data Gravity: This was a concept developed by my friend, Dave McCrory, a few years ago, and it is a burgeoning area of study today. The idea is that as data is accumulated, additional services and applications are attracted to this data – similar to how a planet’s gravitational pull attracts objects to it. An example would be the number 10. If you the “years old” context is “attracted” to that original data point, it adds a certain meaning to it. If the “who” context is applied to a dog vs a human being, it takes on additional meaning.

Relative Location with Similar Data: You could argue that this is related to data gravity, but I see it as a more poignant a point that bears calling out. At a Hadoop World conference many years ago, I heard Tim O’Reilly make the comment that our data is most meaningful when it’s around other data. A good example of this is medical data. Health information of a single individual (one person) may lead to some insights, but when placed together with data from a members of a family, co-workers on a job location, or the citizens of a town, you are able to draw meaningful conclusions. When around other data, individual pieces of data take on more meaning.

Time: This came up when someone posed the question “does anyone delete data anymore?” With the storage costs at scale becoming more and more affordable, we concluded that there really is no economic need to delete data (though there may be regulatory reasons to do so). Then came the question of determining what data was not valuable enough to keep, which led to the epiphany that data that might be viewed as not valuable today, may be significantly valuable tomorrow. Medical information is a good example here as well – capturing the data that certain individuals in the 1800’s were plagued with a specific medical condition may not seem meaningful at the time, until you’ve tracked data on specific descendants of his family being plagued by similar ills over the next few centuries. It is difficult to quantify the value of specific data at the time of its creation.

In discussing this with my industry colleagues, it became very clear how early we are in the evolution of data / big data / software defined storage.  With so many angles yet to be discussed and discovered, the possibilities are endless.

This is why it is critical that you start your own journey to salvage the critical insights your data offers.  It can help you drive efficiency in product development, it can help you better serve you constituents, and it can help you solve seemingly unsolvable problems.   Technologies like object storage, cloud based storage, Hadoop, and more are allowing us to learn from our data in ways we couldn’t imagine 10 years ago.

And there’s a lot happening today – it’s not science fiction.  In fact, we are seeing customers implement these technologies and make a turn for the better – figuring out how to treat more patients, enabling student researchers to share data across geographic boundaries, moving media companies to stream content across the web, and allowing financial institutions to detect fraud when it happens.  Though the technologies may be considered “emerging,” the results are very, very real.

Over the next few months, we’ll be blogging on how specific customers are making this work in their environments, tips on implementing these innovative technologies, some unique innovations that we’ve developed in both server hardware and open source software, and maybe even some best practices that we’ve developed after deploying so many of these big data solutions.

Stay tuned.

Until next time,

Contributors to this post:  

Joseph George, Directory HP Servers – @jbgeorge ; Director, HP Servers

Leo Leung, VP, Scality – @lleung  

Look for more on this topic on Crowdchat.net/RealDataStories for a thought leaders conversation series.  All crowdchats are done in the open in public on the record so come “join the conversation”.

The Feb 19th, raw transcript from the crowdsourced conversation is attached below.

#RealDataStories CrowdChat on Feb 19th, 2015 with thought leaders held by leaders at Scality and HP with influencers in the community.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU