UPDATED 10:00 EDT / FEBRUARY 03 2016

NEWS

Hadoop creator buoyant despite framework’s complexity woes

When Doug Cutting created the open-source Hadoop framework while working at Yahoo! Inc. in 2006, he didn’t imagine his invention would spawn an ecosystem of breathtaking scope and complexity. The dozens of projects that have sprung up to extend and enhance the core framework have brought big data to the mass market, in the process becoming both Hadoop’s greatest asset and also its most vexing problem.

In an interview with SiliconANGLE on the occasion of Hadoop’s 10th anniversary, Cutting remarked on the challenges of navigating a complex ecosystem that is constantly adapting to new and disruptive technology. “The management challenge frightens me,” he said. “It’s tricky, like we’re on the edge of control.”

But he also expressed confidence that these growing pains will subside as Hadoop enters the mainstream. Platform transitions invariably involve a certain level of chaos, he said, noting that it was a decade before the relational database market matured to the degree that third-party applications emerged. “It’s the nature of the beast,” he said.

Complexity is often cited as the biggest barrier to widespread Hadoop adoption, but it has also sparked a burgeoning cottage industry of Hadoop-in-the-cloud services as well as tools for managing the complexity like those developed by Cloudera Inc., where Cutting is chief architect.

The father of Hadoop doesn’t do much coding these days, but he remains closely involved with the industry he helped create. “I’m not the innovator anymore,” he said. “My job is to keep an eye on what’s happening and to listen to how things are being received. When there’s a groundswell, we need to be part of it.”

That task grows more complex every year. Hadoop is a poster child of open source innovation. With no central governance or supervisory authority, the ecosystem must constantly adapt to new influences. “Things don’t seem to be getting simpler. If anything, they’re getting more complicated,“ Cutting said.

Take Intel Corp’s new 3D Xpoint technology, which delivers low-latency storage memory at the speed of DRAM and the cost of solid-state storage. Cutting believes it will send shock waves across the entire big data ecosystem. “Every component of the stack was designed with hard drives and DRAM in mind,” Cutting said. “Those all now need to be reconsidered.”

But open source has a knack for accommodating disruption, and Cutting stays focused on the bigger reality that Hadoop is “an incredibly powerful platform that lets people do things they couldn’t do before. When companies are building new things [with big data] they aren’t looking to Oracle and Teradata,” he noted. “Those guys are going to struggle.”

Struggle, perhaps, but not disappear. Relational technology still does the job for transactional applications, and despite the development of relational-on-Hadoop projects like Apache Phoenix, those legacy systems will be around for a long time. “People who are building new transactional systems are less likely to use traditional RDBMS, but you’re not likely to see existing workloads moving over” to Hadoop, he predicted.

What’s next for Hadoop

Cutting believes two big developments will define the next stage of Hadoop’s evolution. One is the emergence of cloud services that deliver big data functionality without all the complexity. Cloudera expects to be a cloud player with “a stack in the cloud that’s agnostic to the platform and will give people cloud portability but also manageability,” he said.

The second will be applications on top of Hadoop. Up until now the action has mostly been in the stack, but as the platform stabilizes innovation will move up the scale. Cloudera doesn’t plan to get into the applications business, but the arrival of market- and function-specific applications will be an important indication that Hadoop has truly arrived. “Turnkey applications will be a major turning point,” Cutting said. “Companies don’t want to reinvent everything.”

There’s still plenty of work to be done in management, and Cutting’s most immediate focus is there. “We’re selling a dream that you can integrate much more of your data into one place and combine data sources that weren’t combinable before with services sharing data that couldn’t be shared before,” he said. “The idea that one big management systems can handle everything is probably unrealistic, but it’s a dream we have to have.”

Cutting’s role as Hadoop developer doesn’t give him any particular authority over the framework’s direction; these days he mainly listens and helps guide Cloudera’s direction. “Mostly our job is to follow, but we do try to lead now and then,” he said, pointing to the Kudu storage layer as an example.

That’s what’s different about open source: vendors don’t set the priorities. As he noted in a recent post on the Cloudera blog, customers mingle freely with developers in Cloudera’s offices. The vendor’s job is to track and adapt to their direction.

And Cutting says he doesn’t miss the coding life. “It would be hard to top Hadoop, frankly,” he said. “It’s done far better than I ever expected.”


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU