UPDATED 11:02 EST / AUGUST 04 2024

BIG DATA

The emerging data stack brings opportunities and risk for buyers and sellers

The so-called modern data stack is getting a facelift and perhaps a complete body makeover.

As the point of control shifts from the database management system to the governance layer, we cite three dynamics that highlight a reshaping of today’s data landscape, including: 1) Key data players are disrupting the established norm as they expand their aspirations; 2) Data platform vendors that used to compete among each other, as they pursue market expansion, enter new competitive environments up the stack; and 3) These market and stack dislocations cause confusion for customers, which presents both opportunities and risks.

In this Breaking Analysis we review our learnings from Supercloud 7, Get Ready for the Next Data Platform, which featured the top voices and thought leaders in data. We’ll present a view of the shifting data stack as we see it today, review some data points from a recent Enterprise Technology Research survey and close with some final thoughts on what to look for going forward.

Shifting points of control in the data stack

Our analysis coming out of Supercloud 7 provided several insights from the community which reinforced many key points of our premise. Specifically, we see today’s modern data stack, typified by cloud infrastructure and the separation of compute from storage, evolving in critical ways that will have an impact on customer decisions in the near to mid-term. Leveraging survey research from ETR that we introduced last week, we explored the sentiments of joint Databricks Inc. and Snowflake Inc. customers, going deeper into customer perspectives and future plans around open table formats, governance and generative artificial intelligence. The following comments summarize our current views.

Key takeaways:

  • Shifting points of control: Traditionally centered in the DBMS, the control point is moving toward the governance layer. This shift is catalyzing dislocations within the industry and will affect customers spending patterns.
  • Governance layer dynamics: Established players with decades-long experience, are now facing new open-source dynamics highlighted by Unity and Polaris, two emerging governance solutions vying for leadership.
  • Open vs. proprietary formats: Organizations are managing a blend of open and proprietary table formats, adding layers of complexity to governance solutions. This mix includes cloud vendor governance, emerging open-source governance and multifaceted catalog solutions. Iceberg appears to have the early lead but adoption is still nascent.
  • Market expansion impacts: Data platforms are moving beyond traditional metrics, analytics, and dashboards, aiming to build intelligent data apps and construct digital representations of businesses. This necessitates engagement with operational data from legacy systems (for example, Salesforce, Oracle, SAP and others), thereby opening up new opportunities and competitive pressures.
  • Data pipelines and the harmonization layer: Data pipelines have played an important role in consolidating data, but complexity has proved problematic for many customers. Complexity is actually increasing in our view which necessitates new thinking around how to address governance and open formats.

Bottom line

The modern data stack is undergoing a significant transformation, with control points shifting toward governance layers, and data platform vendors, specifically Databricks and Snowflake, attempting to expand their total available market. As these platforms move up the stack, they face new competition, particularly from hyperscalers and legacy software vendors. The complexity of many open and proprietary data and governance choices, highlights the importance of data harmonization. We believe that organizations must navigate these changes carefully to harness the full potential of their data assets, however the path today is uncertain due to a lack of clear standards.

Watch this conversation George Gilbert had with Muralidhar Krishnaprasad of Salesforce to better understand the increasing levels of competition Databricks and Snowflake face as they move up the stack: Building a Metadata-Centric Platform for Intelligent Applications.

“Open data is turning data platforms inside out,” says Gilbert. “Customers, not vendors, now own the data. Operational catalogs such as Unity and Horizon/Polaris are intermediate stopgaps as vendor choke points. Customers can now choose which tools and engines they want to use to extract value from their data. To take just one example, both Snowflake and Databricks made many announcements about allowing non-technical users to query their data using natural language via LLMs. But as long as BI tool vendors do a better job formally defining that data, end-users will get much better results through their BI tools or third-party semantic layers.”

Conflicting priorities and personas in data governance

In last week’s Breaking Analysis, we introduced a flash survey conducted with ETR, based on data from 105 joint Databricks and Snowflake accounts. The survey aimed to uncover prevailing sentiments regarding security, governance and tool selection in data management. We use the following slide from that survey to highlight the diverse and often conflicting priorities that organizations face as they navigate the complexities of modern data governance.

Key takeaways

  • Security and governance are fundamental: A significant majority (86% for security and 70% for governance) of respondents prioritize security and governance above all else. Our view is this inclination tends to favor more integrated platforms like Snowflake, which require customers to put their data into Snowflake to take advantage of the most comprehensive governance solutions.
  • Avoiding lock-In: Conversely, a substantial cohort is focused on avoiding vendor lock-in at all costs, aligning more with Databricks’ open-source ethos.
  • Consolidation vs. flexibility: There is a stark divide, with 45% of respondents indicating a preference for consolidating data into a single tech stack, even at the expense of flexibility. Meanwhile, others prioritize the freedom for analysts to choose their tools, highlighting a fundamental tension within organizations.
  • Persona alignment challenges: The survey data underscores the internal conflicts between different personas within organizations, each with distinct priorities. Aligning these personas through governance and reorganization is a critical but challenging task. Lack of alignment will in our view expose firms to greater risk.
  • On-premises vs. cloud: A notable 39% plan to keep core intellectual property data on-premises for the next year, while others advocate for robust data warehousing systems that minimize the need for open table formats.
  • Data rebels and innovation: A segment of respondents, referred to as “data rebels,” prioritizes rapid innovation over stringent data security and governance. Notably, these data rebels were the most open-minded to moving off Snowflake to Databricks.

A notable 39% of respondents plan to keep core data intellectual property on-premises for at least the next 12 months.

Bottom line

The survey and our analysis reveal a landscape fraught with conflicting priorities and personas, complicating the path toward cohesive data governance. Organizations must navigate these tensions, balancing the need for security and governance with the desire for flexibility and innovation.

As data platforms such as Snowflake and Databricks continue to evolve, the industry must address these challenges head-on to achieve harmonized and effective data management strategies. Organizations must evaluate the quality, efficacy and maturity of open source governance solutions and develop strategies that align with their existing governance approach.

Nearly 30% of respondents in the survey cited comfort with managing their data silos. We generally believe this approach is suboptimal for putting data at the core of operations, but it may bring time to market advantages for individual business units and will likely remain a viable strategy.

The evolution and fragmentation of the modern data stack

As we examine the emerging data stack, it’s evident to us that the so-called modern data stack is evolving rapidly, introducing new complexities and competitive dynamics. While foundational elements like cloud infrastructure and data warehouses are well-established, the layers above are where significant action and innovation are unfolding. The following points summarize our thinking on how the data stack is evolving and the changes it portends.

Key takeaways

  • Cloud infrastructure: Amazon Web Services Inc. set the gold standard for cloud infrastructure. Competitors such as Google LLC, Microsoft Corp. and Oracle Corp. are advancing by learning from AWS’ strengths and weaknesses and developing differentiated strategies at the infrastructure level. Regardless, this layer of the stack is fairly well understood and mature.
  • Data warehousing and pipelines: Snowflake has cemented its place as the leader in cloud DBMS, while Databricks has dominated the data pipeline segment with Spark and other tooling.
  • Open table formats: Though it’s still early days, the interest in interest in adopting open table formats, particularly Iceberg, is on the rise, with 70% of respondents indicating a shift toward this format.
  • Governance layer: The governance layer is becoming the new strategic control point, moving beyond traditional DBMS. Key players are attempting to make this the new “moat,” in our view. This includes Databricks’ Unity Catalog and Snowflake’s Polaris, which must coexist with a variety of solutions from Google, Microsoft, AWS, Informatica Inc., Collibra Inc., Alation Inc. and others. The governance landscape remains highly fragmented, with a plethora of solutions and a complex ecosystem of partnerships and standards. As well, solutions such as Microsoft Purview are attempting to become the “catalog of catalogs,” leaving the governance wars to others.
  • Semantic layer: For the lack of a better term, we often referred to the semantic layer, which involves data harmonization to support the creating digital representations of business entities. This layer is still nascent, with significant development needed to achieve a mature and functional state. We believe that full realization of this layer is still years away, but the industry is attempting to create this harmonization capability. We note that there is a metrics layer that possibly could be drawn on the above graphic below the governance box. Hence the arrows from governance as it touches pieces below and above.
  • Intelligent data apps and products: The upper layers of the stack, namely data products, agents and intelligent apps, are seeing new competition as data platforms expand their TAM. Players such as Palantir Inc., Salesforce Inc. and Microsoft are advancing their capabilities in this space, creating rich metadata and unified data environments. As Databricks and Snowflake expand their aspirations, they increasingly run into these traditional software companies with products that contain business logic and critical data. Being able to connect to this data is fundamental to building intelligent data apps and these legacy firms are unlikely to cede the market to Databricks and Snowflake.

Bottom line

The modern data stack is undergoing a significant transformation, characterized by increasing fragmentation and complexity, particularly in the governance and semantic layers. While foundational elements are established, the competitive landscape is intensifying as companies like Snowflake and Databricks expand their capabilities and face new challengers in the upper layers of the stack. Organizations must navigate these dynamics carefully, leveraging robust governance frameworks and strategic partnerships to harness the full potential of their data ecosystems.

Watch this conversation with visionary data leader Zhamak Dehghani on what’s missing in the emerging data stack.

SanjMo Principal Sanjeev Mohan added that the rise of open table format levels the playing field for not just Snowflake and Databricks but also allows many other players to offer managed lakehouses, such as Fivetran Inc., Confluent Inc. and Salesforce. “Now customers don’t have to move their data into proprietary formats and can bring any combination of compute engines to meet their data engineering, analytics and AI needs,” he said. “For example, for some use cases, customers can analyze data on object stores using DuckDB and for other use cases, use Snowflake. This flexibility can lower costs for the end-users.”

That said, he added, “open-source catalogs on top of table formats are a different story. While the concept of an open-source catalog is appealing, the current offerings are not ready for prime time. It is still very early days as these catalogs are being built and have limited functionality. Please read the fine print before committing to them.”

The journey ahead and the role of hyperscalers

The transformation of the data landscape is a journey that won’t be completed overnight. As industry leaders such as Molham Aref and Zhamak Dehghani have pointed out, this evolution is expected to take three to five years, with numerous challenges and missing pieces along the way. Moreover, we believe the hyperscalers, with their resources and advanced capabilities in machine learning and AI, will play a crucial role in shaping this future.

Key takeaways

  • Three- to five-year journey: The path to a mature data governance framework is long and complex. Key industry figures anticipate significant developments over the next few years but acknowledge the current gaps and challenges.
  • Hyperscalers as major players: Over a third of surveyed Databricks and Snowflake accounts recognize the strong machine learning and AI capabilities of hyperscalers and indicate a leaning in this direction. This positions them as significant influencers and potential disruptors in the data platform ecosystem.
  • Stickiness of data platforms: Core data platforms remain deeply entrenched and difficult to displace. While there may be optimization shifts in data engineering and pipeline workloads, the core functionalities are likely to remain stable.
  • Cost and ROI dynamics: The cost factor currently influences decision-making, but the emergence of AI-driven return on investment could alter the landscape significantly, driving further investment and adoption.
  • Confusion and chaos as opportunities: The current state of flux presents both risks and opportunities. Companies that navigate this chaos effectively can capitalize on new market opportunities and drive significant value.
  • Future outlook and Supercloud 8: The ongoing data evolution will continue to be a focal point in upcoming industry discussions, such as Supercloud 8. Innovation and acquisitions by hyperscalers, along with new players transitioning from on-prem to hybrid models, will shape the competitive dynamics.

Bottom line

We continue to believe the journey toward a fully realized new modern data stack is ongoing, marked by a blend of opportunities and risks. Hyperscalers, with their advanced capabilities, will be pivotal players in this evolution along with Databricks, Snowflake and their respective ecosystems.

The entrenched nature of core data platforms, coupled with shifting cost dynamics and the potential for AI-driven ROI, will influence strategic decisions for customers and shape spending patterns. As the industry navigates this increasingly complex landscape, those who can cut through the noise and leverage data to their advantage will emerge as leaders in the next phase of innovation.

What do you think? How are you handling governance and security of your data? Do you lean toward more integrated and closed platforms like Snowflake because they are “safer,” or do you feel that open formats are the way to go and you can manage the governance concerns over time? And where do the hyperscalers fit in your plans?

Please let us know how you’re thinking about the future of data in your organization.

Disclaimer

All statements made regarding companies or securities are strictly beliefs, points of view and opinions held by SiliconANGLE Media, Enterprise Technology Research, other guests on theCUBE and guest writers. Such statements are not recommendations by these individuals to buy, sell or hold any security. The content presented does not constitute investment advice and should not be used as the basis for any investment decision. You and only you are responsible for your investment decisions.

Disclosure: Many of the companies cited in Breaking Analysis are sponsors of theCUBE and/or clients of Wikibon. None of these firms or other companies have any editorial control over or advanced viewing of what’s published in Breaking Analysis.

Image: theCUBE Research

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU