The next data cycle: Harnessing vast unstructured data to feed the AI/ML machine
While the crucial role of data in the enterprise will never diminish, how companies harness the data continues to evolve.
There’s a radical shift currently happening, however, where structured data as the bulk of what’s stored within warehouses is giving way to unstructured data. Necessitated by the growing demand for artificial intelligence-enabled capabilities, this paradigm is the “next data cycle,” according to David Flynn (pictured), chief executive officer of Hammerspace Inc.
“We went from a world with data warehouses, maybe cloud-hosted data warehouses, highly structured data, a lot of painstaking maintenance of it, to a world now where unstructured data rules the massive amounts of data that you have,” he explained. “That’s because with AI and ML, you don’t need it to be pre-structured for you. So a lot more value can be extracted from much larger quantities of data. We view this as just a new data cycle — it’s the next data cycle.”
Flynn spoke with theCUBE industry analyst John Furrier at the Supercloud 3: Security, AI and the Supercloud event, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the new approach to processing data and orchestrating it to harness AI’s benefits.
The new world of data
As it stands, the current data approach is a procedural one, where it’s sourced, stored, copied and merged with other resources as required. But the aforementioned tectonic shift implies that data “simply exists” and is orchestrated transparently, according to Flynn.
“Moving from store-copy-merge to orchestrated is a radical change,” he noted. “What it allows you to do is to have data be present everywhere that you need it, with high performance, local access so you can feed it into AI training and inference but have it at any site that you need it at.
Data, for the most part, has always been a product of the infrastructure or platform within which it resides. In the new paradigm, the effectiveness and operability of the data goes beyond platforms, Flynn added.
“The first thing is that data is a platform thing,” he explained. “And yet data has been a mirage that is rendered by the infrastructure. Infrastructure storage systems have been the captors of data, so you have an inversion between platform and infrastructure. With data orchestration, you finally have data that can transcend the infrastructure [and] is not captive to the infrastructure.”
The technologies powering this data paradigm shift
With every new wave in enterprise computing, there’s always an underlying set of new tech making it all possible. In this case, it’s an agglomeration of advancements in graphics processing units, new data platform capabilities and advancements in machine learning models, according to Flynn.
“[It’s] obviously GPUs [and] other specialized processing for being able to work with those large amounts of data,” he said. “The unsung hero behind it though is the data platform and the ability to orchestrate the data to where you have the GPU clusters, whether that’s rented in the cloud or on-premise. I think some of the key things are the GPUs, AI and ML, the models and such that we work with for training, and the data orchestration piece for making the data.”
The orchestration piece of the puzzle is where Hammerspace comes in. It’s a globally accessible conduit connecting users with their applications and data across any existing data center infrastructure or on Azure, AWS or Google Cloud infrastructure.
“Fundamentally, it’s two things: It’s a file system that has finished the progression that file systems have been on since the beginning,” Flynn said. “Now what we’re talking about is a file system that can span everything. It can sit on top of any form of storage, third-party storage, vendor-neutral and can span across whole data centers. The second part is that it then allows you to take the data movement; where you place and move data can now be done behind the file system to where it can now be fully automated and is non-disruptive to the use.”
Subsequently, data orchestration occurs when the data is placed behind the file system, as opposed to out in front of it. There is a litany of use cases where data orchestration can open up considerable gains — and one of them is with distributed workforces, according to Flynn.
“Number one is to have a workforce that can be anywhere in the world and be able to interact with the data,” he said. “And number two is to have the compute be anywhere in the world [and to] be able to take data that’s collected at one site and do the ML training at another site, to have researchers at other locations be able to work with it — [with] each of them able to have local high-performance access to that data from different computing centers.”
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of the Supercloud 3: Security, AI and the Supercloud event:
Photo: SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU