UPDATED 15:48 EDT / JUNE 13 2024

TheCUBE analysis of Databricks' strategic shifts in data engineering and data interoperability ... and innovations with open data formats the Unity Catalog. AI

Databricks’ strategic shifts in data engineering and interoperability: theCUBE keynote analysis from Data + AI Summit

The latest advancements and strategic shifts in data engineering and data interoperability are setting the stage for a transformative era in tech.

Industry leaders, such as Databricks Inc., are prioritizing open data formats, flexible systems and the emerging power of small language models to create more seamless and efficient data environments. These developments promise to redefine how data is managed and utilized, driving innovation and competitive dynamics across the tech landscape.

TheCUBE analysts discuss data engineering and data interoperability.

TheCUBE’s Savannah Peterson and John Furrier discuss the significant investments made in achieving data compatibility.

“It isn’t about controlling the storage layer. It’s about what you don’t want to have multiple copies of … how you bring the data together,” said Rob Strechay (pictured, second from left), principal analyst at theCUBE Research. “It’s not about the data. The formats have to converge.”

Strechay was joined by his co-analysts John Furrier (right), Savannah Peterson (second from right) and George Gilbert (second from left), as they discussed key analysis live from the Data + AI Summit, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They talked about the day’s keynote address, as well as data engineering, interoperability and the emerging role of small language models. They also took a look at the significant investments made in achieving data compatibility, the competitive dynamics between Databricks and Snowflake Inc., and the introduction of Databricks’ Unity Catalog as a game-changing tool for unified data governance.

Data interoperability and open formats: The new battleground

The ongoing efforts to achieve interoperability within data systems have become crucial for industry leaders, driving significant investments and innovations. This focus aims to create seamless data environments that enhance efficiency and reduce fragmentation across platforms.

“Ali [Ghodsi] said yesterday at the analyst briefing, ‘We’re not putting our thumb on the scale, but the goal is Delta interoperability, not Snowflake interoperability,” Gilbert said.

Ryan Blue, the original creator and PMC chair of Apache Iceberg, spoke during the keynote about interoperability aimed to eliminate concerns surrounding data compatibility, particularly between Databricks’ Delta and Snowflake. He emphasized that this focus on interoperability is essential for creating a more unified and efficient data environment.

“Snowflake is trying to open up within the perimeters of what Iceberg can support,” Gilbert said. “But iceberg is now going to be oriented around Delta interoperability not Snowflake compatibility.”

This shift signifies a strategic pivot toward enhancing data interoperability across platforms, ensuring that data systems can communicate and operate seamlessly.

The analysts also discussed the significant investment Databricks has made to ensure interoperability. This substantial financial commitment underscores Databricks’ dedication to breaking down data silos and fostering a more integrated and collaborative data ecosystem.

“Buying tabular is Ali saying, ‘This has to work,” Furrier said. “And the commitment, even on stage this morning on the keynote, essentially was, ‘Let’s get to work.’ And the goal is still the same: not care about the data format.”

This analysis highlights Databricks’ dedication to creating a unified data environment, reducing the complexity and fragmentation that currently plagues the industry.

The rise of small language models and multimodal systems

There is a rising prominence of small language models and a notable shift toward multimodal systems in the tech industry. The industry is rapidly recognizing the potential of these smaller, more efficient models.

“We were the first ones to point out small language models. Our power law is playing out,” Furrier said. “What Ali did was interesting. He laid out … that essentially small language models is real, and … the models will interact.”

The conversation also touched on the strategic implications of these advancements for Databricks. These innovations are expected to enhance Databricks’ competitive edge and drive further developments in data management and analytics.

“That sets up the North Star for all the developers who are doing the work,” Furrier said.

This shift toward small language models is setting the stage for a new era in data management, where efficiency and interoperability are prioritized.

Another hot topic was the concept of multimodal systems, which integrate multiple models to create stronger and more efficient data systems. This approach allows for greater flexibility and accuracy in data processing and analysis, setting a new standard for the industry.

“They’re trying to differentiate from folks who just say, I need a frontier model and a vector database, and I’m done,” Gilbert said.

This differentiation underscores the competitive edge that multimodal systems can provide, allowing for more complex and comprehensive data management solutions.

Unity Catalog and the future of data engineering

The keynote announcement of the Unity Catalog by Databricks was another major highlight. This new feature aims to unify and govern data more effectively, creating a single source of truth for enterprise data.

“The significance is that the point of control, the source of truth for your data estate is moved from the database to the catalog,” Gilbert explained. “All the tools look to see what the data means, where it is, what’s its status, and how do I update it. They all go through the catalog now.”

The Unity Catalog’s open-source nature marks a significant step toward democratizing data access and management. By making these powerful tools available to a broader range of users, Databricks is fostering innovation and collaboration across the tech community, enabling organizations of all sizes to leverage advanced data capabilities.

“The unified governance was clearly the message here, and open sourcing it was a big strategy, saying, “whether using our compute engine or not, use this for data engineering,’” Furrier said. “We talked about platform engineering a lot at the KubeCons of the world. Now it’s a data engineering conversation.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE Research’s coverage of the Data + AI Summit:

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy