5 Distinct Hadoop Deployment Patterns
In an e-mail interview this week with Forrester Senior Analyst James Kobielus, I asked about Hadoop’s real-time capabilities. The conversation turned to what he sees as five distinct Hadoop deployment patterns.
It’s a good primer for Hadoop World next week where SiliconAngle will live stream from theCube.
Here they are:
Leveraging Hadoop Proprietary Distros: Use proprietary near-real-time/real-time features of some commercial Hadoop distros (e.g, HStreaming, Outerthought, Hadapt)
Leverage Hadoop Core Sub-Projects: Use Hbase as database/storage layer for near-real-time analysis and Cassandra for real-time requirements beneath your MapReduce modeling/execution abstraction layer.
Leverage Hadoop and Other NoSQL Databases: Supplement and/or replace Hbase/Cassandra with Membase, Couchbase, or other real-time and/or in-memory databases under MapReduce.
Leverage Hadoop and Real-Time Features Commercial Enterprise Databases and Data Warehousing Platforms: Support batch or real-time features of Hadoop (open source and/or proprietary distros) with changed data capture, complex event processing, or other real-time data ingest/processing features of commercial enterprise data warehouse (EDW) such as Teradata, Oracle Exadata, IBM Smart Analytic System, EMC Greenplum Database and other commercial offerings.
Leverage Hadoop and Stand-Alone Complex Event Processing or Message Oriented Middleware: Support batch or real-time features of Hadoop (open-source and/or proprietary distros) with complex event processing and/or message oriented middleware (MOM) from IBM, SAP/Sybase, Streambase, TIBCO, etc.
Services Angle
One thing that Kobielus points out is Hadoop’s immaturity. For example, in his report: Enterprise Hadoop: The Emerging Core of Big Data, Kobielus says that among Hadoop specifications, only Cassandra offers transactional functionality to a wider range of enterprise applications above and beyond Hadoop’s core focus on advanced analytics. The proprietary vendors have added features to bring online transaction processing functionality—such as two-phase commit and rollback—to their offerings.
Kobielus says vendors are offering their own extensions such as real-time and high-availability—to address limitations of the current Apache Hadoop open-source distribution. “The Hadoop community is evolving the core codebase to address these deficiencies, but it may take several years before the open-source distribution becomes a more robust cloud analytics and transaction platform.”
The reality: Hadop is still very early in its development. But is it too slow? That’s a question we will be asking a lot next week at Hadoop World.
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU