Klout and Hadoop, the Pros and Cons
Everybody wants to improve their Klout score, a value that represents a user’s influence across their social network. And while Klout describes to its users how the number is calculated, few people understand how the platform behind the score really works. To lend some insight, Dave Mariani, vice president of engineering at Klout, joined John Furrier and Jeff Kelly at The Cube, broadcasting during Hadoop Summit 2012 in San Jose, Calif (full video below).
Mariani explained how Hadoop’s distributed file system has enabled his start-up to not just process, but also store data cheaply. Hadoop is horizontally scalable, meaning if an organization wants to increase the capacity or speed to process its data, it can increase the number of machines in its Hadoop cluster without changing anything in the underlying software.
Hadoop lets small companies wrestle with huge amounts of data. Klout prefers to work with Hadoop inside its own hosted data center, but for organizations lacking the the resources that Klout has at its disposal, Hadoop can run on top of Amazon EC2. “It’s very inexpensive and very easy out of the gate to get scale,” Mariani said. “We can’t do what we’re doing without Hadoop. We’re out of business without that infrastructure.”
But, Mariani also wasn’t shy to express what he believes Hadoop’s current limitations are, and what he would like to see from the open-source framework moving forward. In a nutshell, platforms like Hadoop — or HBase and Hive, for that matter — lack robust business intelligence capabilities. “You still need schemas on the unstructured data to get the most out of it,” Mariani said.
For a company like Klout, which collects a billion “signals” from its registered users every day, it craves real-time business intelligence to develop better social media analytics that will ultimately lead to more satisfied customers and larger profits for the company. The problem with Hadoop is that it is a batch processing system that struggles in the “real-time world,” Mariani said. As a result, he is waiting for developers to create analytical engines that can run on top of Hadoop to enable it to perform interactive queries.
In the meantime, Klout turns to SQL Server Analysis Services to conduct that sought-after business intelligence. But Mariani would love to see this functionality available in Hadoop. “If you think about what makes Hadoop so great, when you store a piece of data — let’s just say it’s a file — it appears virtually to you as a file…but that actually is distributed across as many nodes as you have in the cluster…So when I do a query…it’s a massive parallel table scan across all these individual hard disks that are out there that I get to take advantage of…So that’s what I want to do with [business intelligence]…versus trying to pipe it and load it into something else.”
Since you’re here …
Show your support for our mission by our 1-click subscribe to our YouTube Channel (below) — The more subscribers we have the more then YouTube’s algorithm promotes our content to users interested in #EnterpriseTech. Thank you.
Support Our Mission: >>>>>> SUBSCRIBE NOW >>>>>> to our Youtube Channel
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.