UPDATED 16:03 EDT / JULY 31 2024

Dipti Borkar, vice president and general manager at Microsoft, discussed open data formats during Supercloud 7.

Microsoft advances data management with open formats and AI integration

Five years ago, if one were to talk about open data formats or governance, they might end up putting others to sleep. But today, it’s become the most important conversation going.

It’s clear that data has evolved. That evolution poses certain advantages for customers, according to Dipti Borkar (pictured), vice president and general manager at Microsoft Corp.

John Furrier and Sanjeev Mohan of theCUBE discussed open data formats with Dipti Borkar, vice president and general manager at Microsoft, during Supercloud 7.

Microsoft’s Dipti Borkar talks with theCUBE about open data formats during Supercloud 7.

“These data formats and table formats, on top of the file formats, essentially give our customers a choice,” Borkar said. “It’s opened up, which means that they can have computes that they can choose on top as well. Multiple different computes can run on these formats. That’s the beauty of it. That’s a great value to customers, which means they can do more with their data.”

Borkar spoke with theCUBE Research’s John Furrier and Sanjeev Mohan at the Supercloud 7: Get Ready for the Next Data Platform event, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the importance of open data formats and the evolving role of data management in the cloud.

Microsoft shifts to open data formats

Microsoft has made the decision to move from its closed-source format to pure open formats with Microsoft Fabric in particular. That was a pretty dramatic change, according to Borkar.

“[We moved] all our engines to reengineer these computes, to now read these native formats,” she said. “We support Delta Lake, and Iceberg is landing very soon. The reason that these are important, again, customers get a choice.”

Companies could run a variety of engines on top and interrupt between platforms. That includes running AI with Databricks or Snowflake, according to Borkar.

“You can interrupt. We have a layer with OneLake that supports these open formats, which allow customers to interrupt, so that you’re not locked in, you can do more with your data. You don’t have to move it around,” she said. “You can actually leave it in place, reduce your cost and get value.”

There are three main open data formats — Delta, Iceberg and Apache Hudi. All three have their own specific way of writing data, and all were built for different use cases, according to Mohan.

“Hudi was built for streaming ingest, and Iceberg … does not support streaming ingest. So when you write in a particular table format, that becomes your primary format,” Mohan said. “The compatibility is only at read-only level.”

That’s because it’s not possible for one to write some piece of data into Delta and then instruct it to make copies into other formats, according to Mohan. That’s because the latency would be too high.

“The fine print … is so important,” he said. “Anytime anyone says this is open source, this is compatible, you really have to take it to the next level of detail to understand what is open-source, what is compatible.”

Combining structured, semi-structured, unstructured data efficiently

Today, Microsoft is seeing a combination of structured, semi-structured and unstructured data going into the lake, according to Borkar. The structured data is essentially open table formats.

“Typically, you would build semantic models on top. For example, with Power BI you have a semantic model, and our Copilot then operates on that semantic model and is available for natural language questions,” Borkar said. “Just using that approach, you can essentially use English to come up with a dashboard, right? Instantaneously.”

For semi-structured and unstructured data, that’s where models directly operating on top of data comes in, according to Borkar. For Microsoft, that includes Azure AI Search.

“[That provides] both the vector indexing capabilities directly on this data, but also keyword-based indexing. So, it’s actually a combination, which is very powerful, because in some cases you might need one,” Borkar said. “In some cases, vector indexing is more powerful, and it applies an internal ranking and gives the best results back out. So, AI Search, on top of OneLake, for example, is one of the patterns that we are also starting to see.”

This is done, essentially, using the ChatGPT versions of Copilot, according to Borkar. All told, it’s a development that has evolved very quickly.

“Now you have a stream of structured data, you’ve thrown in your semi-structured and unstructured data,” Borkar said. “Your vector index is on top of that, and now you’re building generative AI applications.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE Research’s coverage of the Supercloud 7: Get Ready for the Next Data Platform event:

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Microsoft advances data management with open formats and AI integration

Microsoft shifts to open data formats

Combining structured, semi-structured, unstructured data efficiently

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Microsoft advances data management with open formats and AI integration

Microsoft shifts to open data formats

Combining structured, semi-structured, unstructured data efficiently

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Cookies