![](https://d15shllkswkct0.cloudfront.net/wp-content/blogs.dir/1/files/2024/08/241-_-Breaking-Analysis-_-The-Emerging-Data-Stack-Brings-Opportunities-and-Risk-2-1536x863.jpg)
![](https://d15shllkswkct0.cloudfront.net/wp-content/blogs.dir/1/files/2024/08/241-_-Breaking-Analysis-_-The-Emerging-Data-Stack-Brings-Opportunities-and-Risk-2-1536x863.jpg)
The so-called modern data stack is getting a facelift and perhaps a complete body makeover.
As the point of control shifts from the database management system to the governance layer, we cite three dynamics that highlight a reshaping of today’s data landscape, including: 1) Key data players are disrupting the established norm as they expand their aspirations; 2) Data platform vendors that used to compete among each other, as they pursue market expansion, enter new competitive environments up the stack; and 3) These market and stack dislocations cause confusion for customers, which presents both opportunities and risks.
In this Breaking Analysis we review our learnings from Supercloud 7, Get Ready for the Next Data Platform, which featured the top voices and thought leaders in data. We’ll present a view of the shifting data stack as we see it today, review some data points from a recent Enterprise Technology Research survey and close with some final thoughts on what to look for going forward.
Our analysis coming out of Supercloud 7 provided several insights from the community which reinforced many key points of our premise. Specifically, we see today’s modern data stack, typified by cloud infrastructure and the separation of compute from storage, evolving in critical ways that will have an impact on customer decisions in the near to mid-term. Leveraging survey research from ETR that we introduced last week, we explored the sentiments of joint Databricks Inc. and Snowflake Inc. customers, going deeper into customer perspectives and future plans around open table formats, governance and generative artificial intelligence. The following comments summarize our current views.
The modern data stack is undergoing a significant transformation, with control points shifting toward governance layers, and data platform vendors, specifically Databricks and Snowflake, attempting to expand their total available market. As these platforms move up the stack, they face new competition, particularly from hyperscalers and legacy software vendors. The complexity of many open and proprietary data and governance choices, highlights the importance of data harmonization. We believe that organizations must navigate these changes carefully to harness the full potential of their data assets, however the path today is uncertain due to a lack of clear standards.
Watch this conversation George Gilbert had with Muralidhar Krishnaprasad of Salesforce to better understand the increasing levels of competition Databricks and Snowflake face as they move up the stack: Building a Metadata-Centric Platform for Intelligent Applications.
“Open data is turning data platforms inside out,” says Gilbert. “Customers, not vendors, now own the data. Operational catalogs such as Unity and Horizon/Polaris are intermediate stopgaps as vendor choke points. Customers can now choose which tools and engines they want to use to extract value from their data. To take just one example, both Snowflake and Databricks made many announcements about allowing non-technical users to query their data using natural language via LLMs. But as long as BI tool vendors do a better job formally defining that data, end-users will get much better results through their BI tools or third-party semantic layers.”
In last week’s Breaking Analysis, we introduced a flash survey conducted with ETR, based on data from 105 joint Databricks and Snowflake accounts. The survey aimed to uncover prevailing sentiments regarding security, governance and tool selection in data management. We use the following slide from that survey to highlight the diverse and often conflicting priorities that organizations face as they navigate the complexities of modern data governance.
A notable 39% of respondents plan to keep core data intellectual property on-premises for at least the next 12 months.
The survey and our analysis reveal a landscape fraught with conflicting priorities and personas, complicating the path toward cohesive data governance. Organizations must navigate these tensions, balancing the need for security and governance with the desire for flexibility and innovation.
As data platforms such as Snowflake and Databricks continue to evolve, the industry must address these challenges head-on to achieve harmonized and effective data management strategies. Organizations must evaluate the quality, efficacy and maturity of open source governance solutions and develop strategies that align with their existing governance approach.
Nearly 30% of respondents in the survey cited comfort with managing their data silos. We generally believe this approach is suboptimal for putting data at the core of operations, but it may bring time to market advantages for individual business units and will likely remain a viable strategy.
As we examine the emerging data stack, it’s evident to us that the so-called modern data stack is evolving rapidly, introducing new complexities and competitive dynamics. While foundational elements like cloud infrastructure and data warehouses are well-established, the layers above are where significant action and innovation are unfolding. The following points summarize our thinking on how the data stack is evolving and the changes it portends.
The modern data stack is undergoing a significant transformation, characterized by increasing fragmentation and complexity, particularly in the governance and semantic layers. While foundational elements are established, the competitive landscape is intensifying as companies like Snowflake and Databricks expand their capabilities and face new challengers in the upper layers of the stack. Organizations must navigate these dynamics carefully, leveraging robust governance frameworks and strategic partnerships to harness the full potential of their data ecosystems.
Watch this conversation with visionary data leader Zhamak Dehghani on what’s missing in the emerging data stack.
SanjMo Principal Sanjeev Mohan added that the rise of open table format levels the playing field for not just Snowflake and Databricks but also allows many other players to offer managed lakehouses, such as Fivetran Inc., Confluent Inc. and Salesforce. “Now customers don’t have to move their data into proprietary formats and can bring any combination of compute engines to meet their data engineering, analytics and AI needs,” he said. “For example, for some use cases, customers can analyze data on object stores using DuckDB and for other use cases, use Snowflake. This flexibility can lower costs for the end-users.”
That said, he added, “open-source catalogs on top of table formats are a different story. While the concept of an open-source catalog is appealing, the current offerings are not ready for prime time. It is still very early days as these catalogs are being built and have limited functionality. Please read the fine print before committing to them.”
The transformation of the data landscape is a journey that won’t be completed overnight. As industry leaders such as Molham Aref and Zhamak Dehghani have pointed out, this evolution is expected to take three to five years, with numerous challenges and missing pieces along the way. Moreover, we believe the hyperscalers, with their resources and advanced capabilities in machine learning and AI, will play a crucial role in shaping this future.
We continue to believe the journey toward a fully realized new modern data stack is ongoing, marked by a blend of opportunities and risks. Hyperscalers, with their advanced capabilities, will be pivotal players in this evolution along with Databricks, Snowflake and their respective ecosystems.
The entrenched nature of core data platforms, coupled with shifting cost dynamics and the potential for AI-driven ROI, will influence strategic decisions for customers and shape spending patterns. As the industry navigates this increasingly complex landscape, those who can cut through the noise and leverage data to their advantage will emerge as leaders in the next phase of innovation.
What do you think? How are you handling governance and security of your data? Do you lean toward more integrated and closed platforms like Snowflake because they are “safer,” or do you feel that open formats are the way to go and you can manage the governance concerns over time? And where do the hyperscalers fit in your plans?
Please let us know how you’re thinking about the future of data in your organization.
THANK YOU