

One of the most noteworthy findings from Wikibon’s annual update to our big data market forecast was how seldom Hadoop was mentioned in vendors’ roadmaps.
I wouldn’t say that Hadoop — open-source software for storing data and running applications on large hardware clusters — is entirely dead. Most big-data analytics platform and cloud providers still support such Hadoop pillars as YARN, Pig, Hive, HBase, ZooKeeper and Ambari.
However, none of those really represents the core of this open-source platform in the way that the Hadoop Distributed File System or HDFS does. And HDFS is increasingly missing from big data analytics vendors’ core platform strategies.
The core reason why HDFS is receding in vendors’ big data roadmaps is that their customers have moved far beyond the data-at-rest architectures it presupposes. Data-at-rest architectures — such as HDFS-based data lakes — are becoming less central to enterprise data strategies. When you hear “data lake” these days, it’s far more likely to be in reference to some enterprise’s data storage in S3, Microsoft Azure Data Lake Storage, Google Cloud Storage and the like.
Even a Hadoop stalwart such as Hortonworks Inc. sees the writing on the wall, which is why, in its recent 3.0 release, it emphasized heterogeneous object storage. The new Hortonworks Data Platform 3.0 supports data storage in all of the major public-cloud object stores, including Amazon S3, Azure Storage Blob, Azure Data Lake, Google Cloud Storage and AWS Elastic MapReduce File System.
HDP’s latest storage enhancements include a consistency layer, NameNode enhancements to support scale-out persistence of billions of files with lower storage overhead, and storage-efficiency enhancements such as support for erasure coding across heterogeneous volumes. HDP workloads access non-HDFS cloud storage environments via the Hadoop Compatible File System API.
So it was no surprise when MapR recently unveiled its 6.1 data platform update, still in beta, with scarcely a reference to HDFS or any other core component of the Hadoop ecosystem, apart from Hive 2.3. Though it had always distanced itself slightly from the by-the-book Hadoop vendors, MapR has gone even further in its latest release. It now offers a robust next-generation cloud data platform grounded in the following pillar technologies:
Object storage is the core platform for big data now, but it’s very likely that it will be eclipsed in importance by stream computing over the coming decade. As I noted in this recent SiliconANGLE article, streaming is as fundamental to today’s always-on economy as relational data architectures were to the prior era of enterprise computing. In Wikibon’s big data market update, we uncovered several business technology trends that point toward a new era in which stream computing is the foundation of most data architectures:
Enterprises are expanding their investments in in-memory, continuous computing, change data capture and other low-latency solutions while converging those investments with their big data at-rest environments, including Hadoop, NoSQL and RDBMSs. Within the coming decade, the database as we used to know it will be ancient history, from an architectural standpoint, in a world where streaming, in-memory, edge and serverless infrastructures reign supreme.
The last wall of the Hadoop castle is built on at-rest architectures in support of stateful and transactional applications, but it appears likely that streaming environments such as Kafka will address more of those requirements robustly, perhaps in conjunction with blockchain as a persistent metadata log.
In fact, a database-free world may await us in coming decades as streams, object stores, blockchain and the IoT pervade all applications. Check out this thought-provoking article for a discussion of how that’s already possible, and this article for a discussion of how different stream types can support transactional data apps.
Hadoop may still have plenty of useful life left in it. Databases may endure as pillars of many application architectures. But we’ve entered a new era where these familiar landmarks are receding. It’s an era in which stream computing cuts new channels through every application environment and massive object stores anchor it all.
THANK YOU