Composable data systems: The future of intelligent data management
In a diverse and dynamic technology landscape, how can companies create a more intelligent approach to data management? Composable data systems based on open standards may be the next big thing for infrastructure modernization.
Organizations are seeking new ways to build out today’s modern data stacks, which have become increasingly diverse. Recent research of 105 joint Databricks Inc. and Snowflake Inc. customers, conducted in partnership with Enterprise Technology Research, revealed two key trends. More than a third of respondents said they use at least one additional modern data platform other than Databricks or Snowflake. And half say they continue to rely on on-premises or hybrid cloud platforms. These findings highlight the need for multi-platform approaches when creating the modern data stack.
Big data frameworks typically already include storage and compute layers. However, some companies are pushing composability further by separating the application programming interface layer, according to Josh Patterson, co-founder and chief executive officer of Voltron Data Inc.
“Even Snowflake and Databricks are starting to adopt … composable standards,” Patterson said. “Composability is really about freedom — freedom to take your code and run it across a myriad of different engines but also have your data use different engines as well,” he added.
Patterson and Rodrigo Aramburu, co-founder and field chief technology officer of Voltron Data, spoke with theCUBE Research’s Rob Strechay, principal analyst, and George Gilbert, senior analyst, during an AnalystANGLE segment on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed how data platforms are being reshaped by the growing adoption of composable architectures, open standards and leading-edge execution engines.
Open standards simplify composable data systems
Companies are also evolving toward more open standards, according to Aramburu. Databricks, for instance, was an early evangelist for open-source Apache Arrow API as the de facto standard for tabular data representation.
“This really big movement allows companies with all these vendor products to choose the right tools for the right job,” he said. “This is actually what incepted the Ibis project, a Pythonic mainframe for targeting multiple different engines.”
The complexity of today’s data landscape, with its proliferation of data products and apps, requires a more modular data stack, according to Aramburu. To manage multiple engines, many companies have built hard-to-maintain abstraction layers with their own domain-specific languages inside the organization.
“A project like Ibis really takes [complexity] out of the hands of the independent corporate company and puts it [into] an open-source community that allows everyone to really benefit off of that labor,” Aramburu said. “Companies are starting to use APIs (such as Apache Iceberg) with both Snowflake and Databricks and standardizing a common data lake across both of them,” he added.
With the standardization of APIs, organizations can generate structured query language across different systems. Along with standardized APIs, accelerated hardware is essential for modern data platforms, particularly for artificial intelligence, according to Patterson. Training large language models requires immense graphics processing unit power, which directly impacts energy consumption. Theseus, a distributed query engine developed by Voltron Data, uses GPUs to process large data volumes with less energy.
“With our current architecture using A100s … [Theseus] is able to do really large-scale data analytics for about 80% less power,” Patterson said. “And we’re working on some really amazing things that we’re going to be announcing at the end of the quarter on how we could probably push that to 95% power reduction.”
Modular, interoperable and composable data systems lower the barrier to entry for adopting AI-related technologies, according to Patterson. Another benefit is that people can use Theseus without having to change their APIs or data formats, so they can achieve faster performance with fewer servers.
“With modular, interoperable, composable, extensible systems, it gets easier. [Companies] can actually shrink their data center footprint and they can save energy, or they can transfer that energy that they were using for big data into AI,” Patterson added.
Innovation at the data-management level
Composable data systems — in addition to separate compute and data layers — can also have a separate computing storage layer, which enables scalability, according to Patterson. With a decomposed execution engine, multiple APIs can be supported and multiple engines can then access the data. And because everything is running on accelerated hardware, companies can see better price performance and energy performance, which opens up new possibilities at the data management level.
“It makes it possible [for organizations] to just start building domain-specific data systems that are otherwise prohibitively expensive to build,” Patterson said. “And now we [also] have really great innovation at the data management layer.”
With faster layers from the ground up and better networking, storage and data management, it is possible to achieve the same performance levels as the compute engine, Patterson noted. Theseus is an example of that level of performance.
“It acts as a query engine that is meant to be [original equipment manufactured] by others so they can build these domain-specific applications on top of it where you can have a much smaller footprint, faster, [and with] less energy,” he said. “You can go after business use cases that were otherwise prohibitively expensive.”
The future of data analytics and AI
Every industry will begin to specialize in the database ecosystem or data analytics ecosytem, according to Aramburu. Because of this, companies have been forced to act like they are a database company.
“Domain expertise has nothing necessarily to do with data analytics … it’s about what the data analytics enable you to do,” Aramburu said. “At the end of the day … it’s all about building out these incredibly complex data platforms to be able to service that domain expertise,” he added.
As data analytics improve with products such as Voltron’s Theseus query engine, networking will become a lot more important, and companies will start to see higher and faster storage, Patterson predicted. High-speed networking and faster storage will also pave the way for both AI and data analytics and shrink big data problems into a smaller footprint.
“Where there is denser storage, [you have] faster storage, with more throughput,” Patterson said. “I actually see a convergence of AI and big data.”
Here’s the CUBE’s complete AnalystANGLE with Josh Patterson and Rodrigo Aramburu:
https://www.youtube.com/watch?v=_4c62vOJtEg
Image: alengo from Getty Images Signature
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU