Solving the decentralized data strategy’s pitfalls: an analysis of Nextdata’s hourglass construct
Companies striving for data platform success are facing a strategic dilemma: combining the flexibility, scalability and feature-rich nature of open standards with the tighter, more coherent governance controls of monolithic data strategies.
Is there a middle ground that accommodates the best of both worlds, supporting the new wave of data-intensive applications set to power the future? That’s the current thought assignment at Nextdata Technology Inc.
“The diversity and proliferation of application of AI and machine learning analytics across the business and then the diversity and the complexity of sourcing data are the two pressure points that constantly test how we manage that middle piece, that data management,” said Zhamak Dehghani (pictured), founder and chief executive officer of Nextdata. “The point here is that we’ve got to step back and look at that middle part … find a data stack that caters to this essential complexity of left and right and can survive those pressure points.”
Dehghani spoke with theCUBE Research’s Dave Vellante and George Gilbert at the Supercloud 7: Get Ready for the Next Data Platform event, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the need to embrace decentralization and create standards that support a diverse and dynamic ecosystem by standardizing the middle layer of the data stack.
The modern data stack’s complexity
The current data stack is being pressured from two ends, according to Dehghani. On one side, there’s the diversity of data sources, with data existing in various formats and locations across different organizations. On the other side, there’s the proliferation of increasingly intelligent data applications. This dual pressure complicates data management, necessitating a more adaptable and resilient data stack.
“Traditionally, we’ve had and still have [extract, transform and load],” Dehghani said. “Then we have a generation of data tools around metadata and context creation. Then we realized we don’t have a link between how the business speaks about the data and the esoteric encoding of that data into the story, so we layer with semantic graphs. Then we have to solve the discoverability and the governance and control, so we layer with catalog technologies that allow access control, discoverability, getting people to the source of data.”
Traditional ETL processes and centralized data warehouses have evolved into complex ecosystems of tools for metadata creation, context layering, and cataloging. However, the complexity and brittleness of these systems often lead to inefficiencies and bottlenecks.
The future of data management is embracing data platform decentralization and creating standards that support a diverse and dynamic ecosystem. By standardizing the middle layer of the data stack, organizations can achieve greater flexibility and efficiency in managing data across various formats and locations, according to Dehghani.
“Our aspirations are really around helping and contributing to this narrow waist of innovation in the data stack and creating the standards that help with that higher layer within that hourglass,” she said. “I think the rest of the industry is doing an amazing job. We’ve seen Iceberg, we’ve seen work around Delta [Lake], which helps with the lower layers of that stack around file format, and I think where we could be applied is the higher layer of the stack.”
The hourglass data platform model: Simplifying complexity
In Dehghani’s hourglass model, the narrow waist section represents a suite of standardized capabilities for coherence, akin to the TCP/IP protocol for the internet. The model abstracts the complexity of data storage and computation at the bottom while providing diverse application experiences at the top. The key is standardizing the middle layer to facilitate interoperability and reduce the friction of managing diverse data environments, according to Dehghani.
“The same story can be told again with compilers, like [local value numbering], compilers allow programmers to write applications in many different languages and run those applications on many different types of hardware, but they standardize that middle bit,” she said. “That simplifies working with different complex infrastructures and gives you optionality of experience. I think the point here is that we need to push down some of those capabilities that are being bundled into this single stack integrated catalog-plus-plus type of experiences into that standard narrow waist of this hourglass.”
While catalogs have utility in exposing and managing metadata across different technology stacks, they can become problematic when positioned as the single source of truth. A decentralized approach, where discoverability and control are embedded close to the data, would solve this, Dehghani added.
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE Research’s coverage of the Supercloud 7: Get Ready for the Next Data Platform event:
Photo: SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU