UPDATED 11:06 EDT / JUNE 14 2012

Klout and Hadoop, the Pros and Cons

Everybody wants to improve their Klout score, a value that represents a user’s influence across their social network. And while Klout describes to its users how the number is calculated, few people understand how the platform behind the score really works. To lend some insight, Dave Mariani, vice president of engineering at Klout, joined John Furrier and Jeff Kelly at The Cube, broadcasting during Hadoop Summit 2012 in San Jose, Calif (full video below).

Mariani explained how Hadoop’s distributed file system has enabled his start-up to not just process, but also store data cheaply. Hadoop is horizontally scalable, meaning if an organization wants to increase the capacity or speed to process its data, it can increase the number of machines in its Hadoop cluster without changing anything in the underlying software.

Hadoop lets small companies wrestle with huge amounts of data. Klout prefers to work with Hadoop inside its own hosted data center, but for organizations lacking the the resources that Klout has at its disposal, Hadoop can run on top of Amazon EC2. “It’s very inexpensive and very easy out of the gate to get scale,” Mariani said. “We can’t do what we’re doing without Hadoop. We’re out of business without that infrastructure.”

But, Mariani also wasn’t shy to express what he believes Hadoop’s current limitations are, and what he would like to see from the open-source framework moving forward. In a nutshell, platforms like Hadoop — or HBase and Hive, for that matter — lack robust business intelligence capabilities. “You still need schemas on the unstructured data to get the most out of it,” Mariani said.

For a company like Klout, which collects a billion “signals” from its registered users every day, it craves real-time business intelligence to develop better social media analytics that will ultimately lead to more satisfied customers and larger profits for the company. The problem with Hadoop is that it is a batch processing system that struggles in the “real-time world,” Mariani said. As a result, he is waiting for developers to create analytical engines that can run on top of Hadoop to enable it to perform interactive queries.

In the meantime, Klout turns to SQL Server Analysis Services to conduct that sought-after business intelligence. But Mariani would love to see this functionality available in Hadoop. “If you think about what makes Hadoop so great, when you store a piece of data — let’s just say it’s a file — it appears virtually to you as a file…but that actually is distributed across as many nodes as you have in the cluster…So when I do a query…it’s a massive parallel table scan across all these individual hard disks that are out there that I get to take advantage of…So that’s what I want to do with [business intelligence]…versus trying to pipe it and load it into something else.”


A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+  
CUBE Alumni Network
C-level and Technical
Domain Experts
15M+ 
theCUBE
Viewers
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.