Just because a technology is largely open source doesn’t mean a vendor can’t build a (lucrative) business around it. Just ask Sun Microsystems.
Cloudera is attempting to do just that with Hadoop, the open-source framework for storing and processing big data. The company today released an updated version of its core data management platform called Cloudera’s Distribution including Apache Hadoop v3, or CDH3. CDH3 attempts to bring some structure and manageability to Hadoop, which can prove unwieldy for the uninitiated.
“CDH3 integrates all components and functions to interoperate through standard APIs, manages required component versions and dependencies and is maintained by Cloudera with regular patches for enterprise-class reliability,” Cloudera said in a statement accompanying the release.
As part of the upgrade, Cloudera has wrapped its platform in a layer of related open source tools, including HBase, a database that provides real-time read/write access to big data; Hive, a Hadoop data warehouse platform with extract, transform and load (ETL) capabilities; and Pig, an open source programming language used to perform analysis on big data sets.
The upgrade is an important one because it marks a significant step to successfully commercializing Hadoop and bringing it to a wider spectrum of users. Like a lot of open source technologies, Hadoop was conceived in a lab by a really smart guy, in this case Doug Cutting (who now works at Cloudera), and developed by dedicated, sharp contributors. While developing the core technology is a hard enough job, brining it from the lab to the enterprise can sometimes be more difficult still.
Most enterprises don’t have the resources to run a complicated framework like Hadoop in-house without relatively easy-to-use front-end tools that vendors like Cloudera are developing. Even enterprises with the financial resources to hire an army of engineers would have a hard time finding enough talented Hadoop users to get the job done anyway.
The more start-ups and VC-backed trailblazers like Cloudera that take their chances with Hadoop, the better the odds that the open source framework will break into the mainstream. The good news for Hadoop fans is that a handful of vendors in addition to Cloudera are doing so, including Jaspersoft. The open source business intelligence vendor added a Hadoop connector to its core reporting platform in January. Others experimenting with commercializing Hadoop include Datameer and Pentaho. In true open source fashion, a number of these vendors are working together on Hadoop initiatives.
In the case of Hadoop, in addition to making the technology easier to manage and run, I think vendors like Cloudera also need to broaden its appeal beyond Web 2.0 companies that generate large amounts of social media data. Otherwise it may remain a niche technology — albeit a powerful one – relevant only to the Groupons and Facebooks of the world. If it can do that, there’s little doubt Hadoop will join the list of open source technologies that have spawned successful commercial vendors.