Big data is here to stay and Hadoop has come to play a central role in this story, though the enterprise readiness of this open-source analytics platform remains a hotly debated topic. One company that’s had a front row seat to big data and Hadoop implementation is Cloudera, among the earliest to provide enterprise solutions for Hadoop. Here to discuss more on the subject is Cloudera consumer solutions chief Omer Trajman, who recently spoke with John Furrier at the Intel Developer Forum and provided a firsthand perspective on the state of Hadoop today (full video below).
Trajman makes two very clear distinctions in big data. On one end, Hadoop is the catalyst behind this industry-wide shift towards analytics: the growing number of end points and machine-generated data that enterprises are looking to tap into. And then there is the actual change we’re seeing in the data center today: the adoption of flash storage, 10GbE and other technologies that facilitate meaning extraction on a massive scale.
These innovations all represent parts of a transition from what Trajman calls traditional “big iron” to open industry standards. Intel’s Xeon chips powering four out of five servers in the data center, and Hadoop itself is a part of the open Apache ecosystem.
Now, the trend is evolving towards converged storage and compute – IO streams are being built into the servers, and companies like Fusion-io are adding the capacity component to the mix. The executive explains that this is vital from a technical standpoint because Hadoop is not a standalone platform: it connects to the data streams and it connects to the devices that consume what comes out on the other end. The framework and big data apps in general are very storage intensive, which is why Trajman says that it’s imperative to keep it right next to the compute.
Continuing this trend of thought, the Cloudera executive says that IT departments have two choices when it comes to their data. They can either tuck it away in a backend storage tier or leverage it, a task that will almost always involve Hadoop and the active data pipeline that’s needed to operate the platform.
Check out the full interview for the rest of Trajman’s interview, including insight into HBase and Cloudera’s internal growth.