UPDATED 11:17 EDT / AUGUST 26 2011

IBM’s Vision of Data Scientists: Our Bridge to the Future

What is the role of a data scientist?  That’s something we’ve been trying to pin down as the world of big data grows wider, and the need for professionals that understand this semi-vague trend grows in tandem.  The role of the data scientist means many things to many people, and many corporations are anxious to explore the ways in which a data scientist can fit into their long term goals around leveraging all the data and its analysis processes made available with today’s technology.  IBM is one corporation that’s had a forward-looking attitude about the data scientist, readily feeding it resources to help this new profession find its way.

“I see the data scientist as a key influencer in a company because you need a cultural shift in that they have to be enabled to really battle big data,” says Anjul Bhambhri, Vice President of Big Data Products at IBM.

“The application developer will have to now either write new apps which are in the big data space, or be able to extend their existing downstream applications.  The IT admin that were first working with relational databases will now have to manage these clusters with volumes of data.  What we’re providing is an enablement of all these different parts to tackle big data, but it starts with the data scientist.”

The concept Bhambhri describes is one I’m familiar with, starting with my highly varied class schedule in college.  I’d go from neuropsychopharmacology class to Eastern European film and literature, embryology to African politics.  My mother raised an eyebrow at my school day antics, but my avid curiosity across topics has allowed me to make connections across history and interests that elude others.  That’s helped me in my own career, researching news topics and managing an editorial schedule that often spans several areas at once.  Chronicling the story of big data extends an interesting perspective of this particular time in history, but big data’s application to the field of journalism has also given me an opportunity to experience this major shift.


Big data technology, from collection and storage to retrieval and analysis, is a distinguishing factor of our era, as important as the introduction of the internet itself.  And the beauty of big data is that it is an all-encompassing technology that affects as many fields as you can count.  Healthcare, bioinformatics, art history, financial planning, retail, advertising, astrophysics and political science.  The list goes on, and IBM is well aware of the potential big data has to reshape each of these industries.  IBM has clients in several different fields, and they all want insight to their data, as well as the vast data now provisioned on the web.

“There’s a recognition that leaving this data out of the decision-making process will be detrimental to their business,” Bhambhri explains.  “They won’t have a complete understanding of their customers or their own processes.  They want to know if it’s helping to connect with their customers or not.  It’s super critical that their platforms include not just structured data, but can bring in data sources that support unstructured data.”

Indeed, from business intelligence to smarter internal processing, data plays an important role in the future of business management, marketing and research.  From a technology standpoint, IBM has products that enable both IT and application developers, business analysts as well as management.  IBM’s expanding their platforms from just structured data to support the injection of data that may be external to the enterprise, be it Facebook Likes or brand-related tweets.  The loading of additional data sources expands the scope of information a company can apply to its analysis, without having to do the heavy lifting or relying solely on the information it has internally.  And speaking of internal data, Bhambhri says that many companies aren’t even analyzing all the data they already have.  From document files to other rich data, there’s plenty of unstructured information lying around a company.  Better understanding relationships within an enterprise structure will help a company operate more efficiently.

Speed is important here, as big data must be processed quickly, and often in real time.  Not ony is data being continuously generated at high speeds (think log data), but the complexity of unstructured data can slow its analysis down.  The advancements IBM has made address these issues, providing historical, real time and predictive analysis on structured and unstructured data.


Bhambhri is also quick to point out that big data doesn’t always apply to marketing and social media.  She outlines the Smart Baby project, implemented at an Ontario hospital, where premature babies were being monitored.  The ward already had a great deal of information on the babies, but lacked the technology to contextualize the data at a large scale.  Once IBM provided the tools to do so, the ward identified patterns that preceded an infection in the premature babies, and could then take action up to 24 hours in advance.

Indeed, big data is finding ways to empower many self-propelled projects, as I’ve learned from my own work with LiveUnchained.  Co-founder Kathryn Buford is a doctoral student at the University of Maryland, and is leveraging big data to explore African and Caribbean art’s journey around the globe, and also looking to big data as a way to shrink the network surrounding this topic.  We’re actually running a panel on the topic at SXSW – be sure to vote for it!  From the historical perspective to predicting the future, big data can hone in on patterns that would never have otherwise been discovered.  What’s important to note here is the enabling factor of ready data sets, largely provided by the world’s activity on the web.  Companies like IBM are helping enterprises and individuals alike to put that data to work, for countless purposes.

So when it comes to data scientists, they are the pioneers in exploring data, and finding ways to make it applicable across industries, across topics and across departments.  The data scientist is the person that will bridge our past processes with the future of technology, enabling its adoption in a compatible manner.  IBM hopes to be at the center of this transition, and it’s a good company to do so.  IBM’s been around for a century, witnessing numerous iterations of technology and the ways in which it affects the corporate structure and the consumer culture.  The data scientist is the human element in adding structure and familiarity to this wide world of unstructured data, and as they get their arms around the data at hand, they’ll be able to help the rest of us along the way.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU