Now more than eight years old, the Apache Hadoop platform for processing and storing Big Data is on the verge of hitting the big time. Or at least that’s what the industry keeps on telling us anyway. But how big is Hadoop really? Are many organizations actually using it? If so, what are they using it for?
Hard data on the actual size of the Hadoop market is extremely hard to come by, although the companies leading the Hadoop charge are adamant that they’re beating customers away with a stick. Cloudera alone claims to be adding between 50 and 60 new customers every quarter.
The clues certainly point to rapid adoption. You only need to look at the money being thrown around (for example, Intel’s $900 million investment in Cloudera and HP’s deal with Hortonworks) and the size of the largest players to realize everything points to an extremely fast-growing market.
Who’s using Hadoop?
It’s almost impossible to pin down just how many Hadoop users there are, but it’s clear that adoption isn’t quite as widespread as some have claimed. A few years ago, Deloitte somewhat optimistically forecast that by the end of 2012 more than 90 percent of the Fortune 500 will likely have at least some big data initiatives under way, which would imply that Hadoop would be part of the mix.
But IDC’s most recent “Trends in Enterprise Hadoop Deployments” report found that only 32 percent of enterprises had actually deployed Hadoop, with another 36 percent planning to do so in the next 12 months. IDC’s report seems to tally with a similar report from Gartner last year, which found that 30 percent of large organizations had already invested in Big Data technology, with an additional 34 percent planning to do so in the next 24 months. However, Gartner estimated that the number of organizations that have actually deployed Hadoop was way lower than expected.
“Adoption is still at the early stages with less than eight percent of all respondents indicating their organization has deployed big data solutions,” said Frank Buytendijk, research vice president at Gartner. “Twenty percent are piloting and experimenting, 18 percent are developing a strategy, 19 percent are knowledge gathering, while the remainder has no plans or don’t know.”
Further evidence of Hadoop’s modest adoption comes from InformationWeek’s 2014 State of Database Technology Survey, which states “Hadoop is in production or pilot by only 13 percent of the 956 respondents”. Compare that with traditional databases like Microsoft SQL Server (75 percent) or Oracle (47 percent) and it’s clear Hadoop still has a ways to go.
Hadoop in the real world
So what about the organizations that do use Hadoop? What are they actually using it for? IDC says the vast majority of users combine Hadoop with other databases to perform Big Data analysis. Nearly 39 percent of respondents say they use NoSQL databases like HBase, Cassandra and MongoDB, and nearly 36 percent say they are using Greenplum and Vertica in conjunction with Hadoop.
Moving beyond “traditional Hadoop”, Gartner recently conducted a survey among existing Hadoop users to find out what the second most-popular type of processing on Hadoop was, after MapReduce. Here’s what it found:
- 53 percent are doing interactive SQL
- 18 percent are running database management systems
- 14 percent are doing stream processing
- 9 percent are running search
- 6 percent are running graph applications
That interactive SQL has become so popular with Hadoop users is a sign of how far things have come. Hadoop vendors are recognizing the platform’s limitations and seeking to address them. Marilyn Matz, CEO and co-founder of Paradigm4, recently described how most major vendors are adding SQL functionality to address the limitations of MapReduce, and to accommodate a preference for a higher-level query language over low-level programming languages like Java.
Use Cases of Hadoop
It’s important to look at the industries that are running Hadoop too. A recent CB Insights survey of 350 venture capital-backed companies sheds some light on this. Not surprisingly, Business Intelligence, Analytics & Performance Management was the leader, closely followed by two ad tech related areas — Advertising, Sales and Marketing Tech and Advertising Networks & Exchanges.
Image credit: CBInsights
More interesting, perhaps, is the kinds of projects these industries are running with Hadoop. It turns out Hadoop is an extremely versatile tool with potentially hundreds of different applications. A 2012 article in GigaOM illustrates ten of the most common use cases of Hadoop besides advertising. They include eCommerce, infrastructure management, energy discovery, energy savings, image processing, fraud detection and health care.
Cloudera has a number of case studies on its site highlighting the different things its customers are doing with Hadoop. These include the eCommerce site Shopzilla, which deployed Cloudera’s solution to accommodate its requirement to process and deliver insights on millions of pageviews or ten billion ad bid requests daily; and Treato, a health information portal that uses Hadoop to streamline access to thousands of community sites and forums.
With so many industries seeing value in Hadoop despite its releatively low rate of current enterprise adoption, it’s easy to see why there’s so much optimism about the future.
In its Big Data Vendor Revenue and Market Forecast 2013-2017, Wikibon said it expects rapid growth for Hadoop, with revenues set to rise from $18.6 billion in 2013 to $50.1 billion by 2018. Furthermore, the evidence suggests that Hadoop will account for a large slice of this market, with Wikibon noting that 62 percent of respondents to its survey expect to optimize enterprise data warehouses by offloading data and batch workloads (ETL) to Hadoop; and 69 percent of respondents expect to make enterprise-wide data available for analytics in Hadoop.
These findings were echoed by more recent research from Allied Market Research, which forecasts that the global Hadoop market will grow at a CAGR of 58.2% between 2013 and 2020. It put Hadoop’s market value at $2.0 billion in 2013, rising to a staggering $50.2 billion by 2020.
With that kind of money on the table, it seems the race for Hadoop dominance has only just begun.