Q&A: Inside IBM’s data virtualization, data everywhere strategy
Data virtualization has been one of IBM Corp.’s primary concerns since Red Hat Inc.’s OpenShift open-source container application platform gained steam across the industry. Essentially, the combination of OpenShift and IBM’s Cloud Private enterprise-grade private cloud platform resulted in Big Blue’s big move to the hybrid cloud.
In doing so, IBM was able to include data virtualization through the Cloud Private for Data platform, consequently producing a strong relationship between data science and artificial intelligence, according to Daniel Hernandez (pictured), vice president of IBM Analytics.
Hernandez spoke with Dave Vellante (@dvellante), host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during IBM Change the Game — Winning With AI event in NYC. They discussed how IBM is offering a single-integrated platform through its Cloud Private for Data platform, as well as how that switch to the hybrid cloud has impacted client data and IBM in general. (* Disclosure below.)
[Editor’s note: The following answers have been condensed for clarity.]
Give us the update on momentum in your business.
Hernandez: So when we last talked we were just introducing something called IBM Cloud Private for Data. The basic idea is anybody that wants to do data science, data engineering, or building apps with data anywhere, we’re going to give them a single integrated platform to get that done. It’s going to be the most efficient, best way to do those jobs.
We had [Hortonworks CEO] Rob Beerden on our program, and he talked a lot about the IBM, Red Hat and Hortonworks relationship. Certainly, they talked about it on their earnings call and there seems to be a clear momentum in the marketplace. But give us your perspective on that announcement. What exactly is it all about?
Hernandez: You go back to June last year; we entered into a relationship with Hortonworks where the basic premise was customers care about data and any data-driven initiative was going to require data science. The other element of that was we’re going to bring our data science and machine learning tools and runtimes to where the data is, including Hadoop. That’s been a resounding success. The next step up is how do we proliferate that single integrated stack everywhere, including private cloud or preferred clouds like OpenShift.
So there were two elements of the announcement. We did the hybrid cloud architecture initiative, which is taking the Hadoop data stack and bringing it to containers and Kubernetes. And the other was we’re going to bring that whole stack onto OpenShift. So, on IBM’s side, with IBM Cloud Private for Data, we are driving certification of that entire stack on OpenShift so any customer that’s betting on OpenShift as their cloud infrastructure can benefit from that and the single integrated data stack.
Everybody’s talking about containers, Kubernetes and multicloud. Those are the hot trends. I presume you’ve seen the same thing?
Hernandez: 100 percent. If data is imperative for you, you better run your data analytic stack wherever you need to, and that means multicloud by definition. So you’ve got a choice. You can say, I can port that workload to every distinct programming model and data stack or you can have a data stack everywhere, including multiclouds and Open Shift in this case.
The cloud, whether it’s on-prem or in the public cloud, expands now to the edge; you’ve also got this concept of data virtualization. What’s it all about?
Hernandez: Data virtualization has been going on for a long time. Its basic intent is to help you access data through whatever tools, no matter where the data is.
Traditional approaches to data virtualization are pretty limiting. So they work relatively well when you’ve got small data sets. But when you’ve got highly fragmented data, which is the case in virtually every enterprise that exists, a lot of the undermined technology for data virtualization breaks down.
We’ve been incubating technology under this project codenamed Queryplex. It was pretty clear that this is a game-changing method for data virtualization that allows you to drive the benefits of accessing your data wherever it is, pushing down queries where the data is and getting benefits of that through a highly fragmented data landscape. And so what we’ve done is take that extremely innovated next-generation data virtualization technology, include it in our data platform, called IBM Cloud Private for Data, and made it a critical feature inside of that.
So what’s the secret sauce of Queryplex and data virtualization? How does it all work? What’s the tech behind it?
Hernandez: Technically, instead of data coming and getting funneled through one node. If you ever think of your data as kind of a graph of computational data nodes, what Queryplex does is take advantage of that computational mesh to do queries and analytics. It distributes out that workload. A low computing aggregate, it’s probably going to be higher than whatever you can put into that single node.
And how do customers access these services? How long does it take? They get this capability as part of what?
Hernandez: It would look like a standard query interface to them. So this is all magic behind the scenes — IBM’s Cloud Private for Data. It’s going to be a feature, so this project Queryplex is introduced as next-generation data virtualization technology, which just becomes a part of IBM Cloud Private for Data.
Can we talk about the business impact of Queryplex and data virtualization?
Hernandez: Better economics. You don’t have to do ETL in this particular case. So data at rest [is] getting consumed because of this online technology. Two, performance. Because of the way this works, you’re actually going to get faster response times. [And] three, you’re going to be able to query more data simply because this technology allows you to access all your data in a fragmented way without having to consolidate it.
Now [that] the conversation has moved to AI, your thoughts about where that innovation is coming from and what the potential is for clients?
Hernandez: You need data; you need algorithms; you need to compute. And bringing those together are exactly the combination that you need to implement on any AI system. You already have data and computational grids here. You’ve got algorithms bringing them together and solving some problem that matters to a customer; [it] is the natural next step.
OK, let’s talk about Stack Overflow. Why Stack Overflow; you’re targeting developers?
Hernandez: So instead of having a distinct community for AI that’s focused on AI machine developers, why not bring the artificial intelligence community to where the developers already are, which is Stack Overflow. So, if you go to AI.stackexchange.com, it’s going to be the place for you to go to get all your answers to any question around artificial intelligence and, of course, IBM is going to be there in the community helping out.
Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of IBM Change the Game — Winning With AI event. (* Disclosure: TheCUBE is a paid media partner for the IBM Change the Game — Winning With AI event. Neither IBM, the event sponsor, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Photo: SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU