

In an e-mail interview this week with Forrester Senior Analyst James Kobielus, I asked about Hadoop’s real-time capabilities. The conversation turned to what he sees as five distinct Hadoop deployment patterns.
It’s a good primer for Hadoop World next week where SiliconAngle will live stream from theCube.
Here they are:
Leveraging Hadoop Proprietary Distros: Use proprietary near-real-time/real-time features of some commercial Hadoop distros (e.g, HStreaming, Outerthought, Hadapt)
Leverage Hadoop Core Sub-Projects: Use Hbase as database/storage layer for near-real-time analysis and Cassandra for real-time requirements beneath your MapReduce modeling/execution abstraction layer.
Leverage Hadoop and Other NoSQL Databases: Supplement and/or replace Hbase/Cassandra with Membase, Couchbase, or other real-time and/or in-memory databases under MapReduce.
Leverage Hadoop and Real-Time Features Commercial Enterprise Databases and Data Warehousing Platforms: Support batch or real-time features of Hadoop (open source and/or proprietary distros) with changed data capture, complex event processing, or other real-time data ingest/processing features of commercial enterprise data warehouse (EDW) such as Teradata, Oracle Exadata, IBM Smart Analytic System, EMC Greenplum Database and other commercial offerings.
Leverage Hadoop and Stand-Alone Complex Event Processing or Message Oriented Middleware: Support batch or real-time features of Hadoop (open-source and/or proprietary distros) with complex event processing and/or message oriented middleware (MOM) from IBM, SAP/Sybase, Streambase, TIBCO, etc.
One thing that Kobielus points out is Hadoop’s immaturity. For example, in his report: Enterprise Hadoop: The Emerging Core of Big Data, Kobielus says that among Hadoop specifications, only Cassandra offers transactional functionality to a wider range of enterprise applications above and beyond Hadoop’s core focus on advanced analytics. The proprietary vendors have added features to bring online transaction processing functionality—such as two-phase commit and rollback—to their offerings.
Kobielus says vendors are offering their own extensions such as real-time and high-availability—to address limitations of the current Apache Hadoop open-source distribution. “The Hadoop community is evolving the core codebase to address these deficiencies, but it may take several years before the open-source distribution becomes a more robust cloud analytics and transaction platform.”
The reality: Hadop is still very early in its development. But is it too slow? That’s a question we will be asking a lot next week at Hadoop World.
THANK YOU