BIG DATA
BIG DATA
BIG DATA
Snowflake Inc. is expanding its open data architecture strategy with a set of interoperability enhancements that aim to reduce data movement, simplify governance and improve how artificial intelligence systems access enterprise data.
Today’s announcement centers on the ability for organizations to access, govern and analyze data across multiple platforms without being constrained by proprietary systems. The company says existing architectures force organizations to move data between platforms, creating operational complexity, security risks and higher costs while limiting the effectiveness of AI workloads.
“When teams can’t act on data in place, they are forced to move it,” the company said in a blog post published today. Fragmented pipelines and governance models can undermine AI initiatives by depriving systems of consistent, well-governed data.
The company’s emphasis on interoperability reflects growing pressure on enterprises to unify data environments as AI adoption accelerates. Snowflake says duplicated pipelines, inconsistent governance and siloed semantics create what it describes as a “tax” on both data architecture and AI investment.
“Agency over data can’t be accomplished by a single vendor or with just data interoperability,” the blog post said. “It requires interoperability at each layer of an architecture [with] solutions grounded on widely accepted open and community-driven initiatives that prioritize vendor-neutral interoperability.”
At the core of the announcement is expanded support for the Apache Iceberg version 3 open table format, with availability planned soon. Iceberg has gained traction as a standard for managing large-scale analytic datasets across multiple engines. Snowflake is positioning its implementation as more production-ready than competing offerings.
Iceberg V3 introduces features such as support for semi-structured data through a “Variant” data type, support for geospatial data types, row-level lineage tracking for change data capture, improved delete operations through deletion vectors and nanosecond-level timestamp precision.
Snowflake said these enhancements will work across both Snowflake-managed tables and external Iceberg catalogs, enabling what it describes as a “portable” data experience across environments.
The update reflects a shift from basic interoperability toward production-grade capabilities, said James Rowland-Jones, director of product management at Snowflake.
“What’s new here is the expansion from foundational interoperability to more complete, production-ready interoperability across data, governance and semantics,” Rowland-Jones said in written comments. “This means customers can start running more advanced, real-world workloads on open, interoperable data, not just experimenting with it.”
Snowflake is also extending interoperability beyond data formats to include governance and business logic, areas that have historically been tightly coupled to individual platforms. The company is promoting Apache Polaris, a catalog it developed and released to open source two years ago, as a mechanism for making governance policies portable across systems.
The company argues that while Iceberg standardizes how data is stored, it does not address how access controls, lineage and semantic context are managed. Polaris is intended to fill that gap by enabling policies to move with the data rather than remaining tied to a specific engine.
Snowflake said it is working on several mechanisms to enable this portability, including policy exchange standards, governance federation and read restriction application programming interfaces. These improvements are intended to allow one system to share pre-evaluated access rules that can be enforced by another system without requiring data to be copied or reprocessed.
Rowland-Jones said this approach addresses longstanding inefficiencies in how governed data is shared.
“Currently, the only ‘safe’ way to share data governed by fine-grained access control with an external engine is to use an API to materialize intermediate results,” he said. “That process is operationally inefficient, costly and often unpredictable. We are breaking this cycle with Apache Polaris.”
Another component of the announcement is pg_lake, an open-source PostgreSQL extension announced last November to bridge transactional and analytical systems. It enables PostgreSQL databases to query data lake formats such as Parquet and CSV directly and to write data into Iceberg tables without requiring extract, transform and load processes.
Snowflake said eliminating ETL pipelines between transactional and analytical systems can reduce latency and operational overhead while simplifying architecture. Instead of maintaining separate systems for different workloads, organizations can operate on a shared data layer.
“The goal with pg_lake is to simplify the architecture by removing the need for complex pipelines,” Rowland-Jones said.
The company is also investing in emerging standards aimed at improving how AI systems interpret data. These include OpenLineage, which tracks data movement across systems, and Open Semantic Interchange, a specification designed to standardize business definitions such as metrics and dimensions.
Snowflake asserted that inconsistent semantics force AI models to repeatedly infer meaning from raw data, increasing computing costs and reducing accuracy. By making semantic context portable, the company said organizations can improve model performance and reduce redundant processing.
Rowland-Jones acknowledged that Open Semantic Interchange is still in its early stages but said industry participation suggests strong demand.
“The first specification is now available under an Apache 2 license and backed by a coalition of more than 35 industry partners,” he said. “When models have access to consistent definitions, they produce more accurate results and require less rework.”
Snowflake is diversifying beyond its proprietary roots and framing these efforts as part of a broader shift toward open, community-driven data architectures. The company said its engineers have made more than 9,000 contributions to open-source projects over the past two years and are actively involved in shaping future Iceberg capabilities, including planned enhancements in version 4.
Those are expected to include improvements to metadata performance, support for column-level updates and expanded indexing options, all aimed at improving performance for streaming, machine learning and search workloads.
Snowflake’s strategy positions open standards as a competitive differentiator, even as it continues to offer managed services layered on top of those standards. The company said its proprietary Horizon Catalog integrates Polaris to provide centralized governance while maintaining compatibility with external systems.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.