Predictions 2024: Deciphering gen AI’s effect on data, governance and skill gaps
With Sanjeev Mohan, Tony Baer, Carl Olofson, Dave Menninger and Doug Henschen
In the words of famous people such as Nobel laureate Neils Bohr and baseball legend Yogi Berra, predictions are very difficult, especially if they’re about the future.
In this special Breaking Analysis, we’re pleased to host our third annual data predictions power panel with some of our collaborators in theCUBE Collective and members of the data gang. With us today are five of the top industry analysts focused on data platforms. Sanjeev Mohan of Sanjmo, Tony Baer of dbInsight, IDC’s Carl Olofson, Dave Menninger of Ventana Research, now part of ISG, and Doug Henschen with Constellation Research.
A scan of the top analytics, BI, data and machine learning platforms
Before we get into it, let’s share some data from Enterprise Technology Research’s October survey of more than 1,700 information technology decision makers.
This graphic shows Net Score or spending momentum on the vertical axis and the Overlap of these platforms within those 1,700 accounts, representing their pervasiveness within the data set. This data is specifically for the analytics, business intelligence, database/data warehouse and machine learning/artificial intelligence sectors. We’ve chosen a subset of the companies in this group of sectors that are representative of leading vendors, many included in today’s discussion. That red line at 40% indicates highly elevated spending velocity on a platform.
A couple of quick points include: 1) the presence of Microsoft Corp. and Amazon Web Services Inc. in these combined sectors is notable and well ahead of Google Cloud; 2) the momentum of OpenAI, at a Net Score of nearly 80%, is astoundingly impressive and its presence on the X axis represents about seven times the account penetration of Anthropic PBC, which you see on the lefthand side of this chart just above Dataiku Inc.; 3) Snowflake Inc. and Databricks Inc. remain above the 40% mark with strong momentum and 4) you can see a number of companies that we’ll discuss directly and indirectly across this graphic in the basket of sectors, including MongoDB Inc., SAP SE, IBM Watson, as well as governance, metadata, pipeline and extract/transform/load tools such as Informatica Inc., Collibra NV, Alation Inc., Alteryx Inc. and others. There are also business intelligence platforms such as Thoughtspot Inc., QlikTech International AB, Tableau and Looker, and of course a number of database and data analytic platforms such as Couchbase Inc., Cloudera Inc., SAS Institute Inc. and notably Oracle Corp. and SAP.
This gives you a general quantitative sense of the relative position of these platforms in what is a multihundred billion-dollar total available market.
Revisiting the 2023 data analysts’ predictions
Let’s get started by looking back at our 2023 predictions and looking at how the analysts fared.
The graphic below shows all of the 2023 predictions for each analyst in one table with some commentary on evidence of whether the prediction was a direct hit (green), a glancing blow (yellow) or a miss (red). So a quick scan of the heat map shows you the data gang did pretty well – notwithstanding these were self-evaluated by each analyst.
Unified metadata becomes the kingmaker and data products rise
Let’s get into each of the 2023 predictions starting with Sanjeev Mohan.
Above we show Sanjeev’s predictions about unified metadata becoming the kingmaker and his expectation that data products would rise in popularity. Sanjeev cites as evidence Microsoft Fabric, Databricks Unity Catalog and some other proof points. We further summarize Mohan’s predictions as follows:
Summary of data catalogs and data products analysis (2023 look-back)
Reflecting on the previous year’s predictions, it’s evident that the expectations around data catalogs and data products were not only met but exceeded, particularly in the context of AI’s rapid advancements. The transformation of data catalogs into multifaceted tools and the mainstreaming of data products highlight significant progress in these areas.
- Evolution of data catalogs: Data catalogs have expanded their functionality beyond traditional roles, incorporating features such as data quality, security and privacy. Notable developments include Unity Catalog’s integration of AI model catalogs with data catalogs and Microsoft Fabric’s unified architecture approach.
- Mainstreaming of data products: Data products have become increasingly central to data strategies, as illustrated by companies such as Intuit Inc. with 900 data products, and the mandate to access data exclusively through these products.
- Expansion in definition and examples: The scope of what constitutes a data product has broadened, now encompassing elements such as retrieval-augmented generation or RAG pipelines and AI agents. The integration of large language models inference into data products is a notable example of this trend.
- Integration of AI: The unexpected surge in AI relevance has further propelled the importance and capabilities of both data catalogs and data products, offering new dimensions and use cases.
Data catalogs and data products have not only met the expectations set last year but have also significantly evolved, especially in light of AI’s growing influence. Data catalogs are now more than mere repositories, playing pivotal roles in various aspects of data management. Similarly, data products have transitioned from niche concepts to mainstream tools integral to data strategies, with their scope and application continuously expanding. This progress indicates a positive trend toward more integrated, AI-enhanced data management solutions.
Rethinking the modern data stack
Next we go to Tony Baer, who predicted that the industry would begin to rethink the modern data stack. He cites some evidence of that with a mix of green, yellow and red.
Tony further defended his and we summarize his thoughts below.
Summary of the modern data stack analysis (2023 look-back)
Mohan’s analysis of the modern data stack’s performance in the past year reveals a mixed picture of progress and challenges. The concept, aimed at modularizing the transition from transactional to analytic data, brought significant advancements in some areas but also introduced complexities.
- Modularization and complexity: The intent to modularize data processes led to added complexity in execution.
- Progress in flattening analytics and transaction data: Notable advancements were seen in combining analytic and transaction databases, as evidenced by Oracle’s MySQL enhancements and Google’s work with Postgres in AlloyDB.
- Amazon’s integration efforts: AWS made strides in database integration, using technologies such as Aurora’s log-based replication to connect databases, including expansion to Postgres and DynamoDB.
- Database machine learning: This area saw substantial growth, with various implementations such as Redshift’s integration with SageMaker and Oracle and Google BigQuery’s in-database machine learning models.
- Data transformation and streaming: ELT (Extract/Load/Transform) has progressed, especially in cloud environments, but the continued use of tools from the likes of Fivetran Inc. and dbt Labs Inc. indicates ongoing challenges in tool integration.
- Lagging in streaming and data pipeline management: Limited progress in streaming and data pipeline management, with potential future improvements through generative AI.
While the modern data stack has achieved impressive advancements in certain areas like database integration and machine learning, it still faces hurdles in simplifying complexities and seamlessly integrating various tools. The evolving landscape suggests a potential role for generative AI in addressing these challenges, particularly in data pipeline management.
SQL is back!
OK, moving right along, Carl Olofson predicted that in 2023, SQL will be back! And he shows a sea of green in his evidence column. We asked Carl, “Was SQL ever gone?”
Here’s how we summarize Carl’s 2023 prediction and the related proof points that led to the direct hit evaluation.
Summary of the resurgence and relevance of SQL (2023 look-back)
In contrast to earlier predictions made by some about SQL’s demise, the past year has demonstrated not only its resilience but also its growing relevance in the data management landscape. Despite initial claims of its obsolescence, major players in the database industry have increasingly embraced SQL, underscoring its enduring importance.
- MongoDB’s pivot to SQL: MongoDB, once dismissive of SQL, introduced a SQL query mechanism, signaling a significant shift in attitude by aligning with customer requirements.
- Competitors embracing SQL: Couchbase released a column-based SQL analytics engine, and Redis Labs Inc. has supported SQL for a few years, further indicating this broader industry trend.
- Databricks’ adoption of SQL: Once focused solely on Spark, Databricks has developed its own SQL capabilities, highlighting a change in strategic direction.
- Popularity of SQL-based DBMS engines: Leading database management system engines such as Oracle, MySQL, Microsoft SQL Server and PostgreSQL remain SQL-based. Oracle has expanded its SQL offerings, further investing in technologies such as MySQL with HeatWave and Postgres, suggesting a growing market opportunity.
- Application developer preferences: Despite a preference for document-oriented databases such as MongoDB among application developers, SQL continues to be relevant.
- Multimodel future: The industry is moving toward a multimodel approach, where databases support multiple data formats and access methods. AI and generative AI trends are reinforcing this shift.
- SQL’s analytical strength: SQL remains a powerful tool for business data analysis, capable of handling diverse data inquiries without requiring predetermined database structures.
Far from becoming obsolete, SQL has experienced a resurgence, with major database companies increasingly integrating it into their platforms. The data management industry is evolving toward a multimodel future, where SQL’s versatility in data analysis continues to be invaluable. This trend supports our thesis that SQL will maintain its prominence as a primary tool for business data analysis, despite the growing diversity in database technologies and formats.
The definition of data expands in 2023
David Menninger predicted that the definition of data is expanding, using metric stores, features stores, model management and data sharing as examples of what we could expect. His self-evaluation shows a mostly green level of accuracy for his prediction.
Menninger provides the following additional detail and we summarized as follows:
Summary of data definitions and gen AI impact (2023 look-back)
The past year has seen a significant expansion in the definition of data, largely influenced by the advent and integration of generative AI. Menninger notes that there is some degree of red in all the analysts’ look-back predictions because generative AI was not emphasized nearly to the degree the market has witnessed. Regardless, this shift has sparked increased attention in various AI-related processes and concepts, though some areas like metrics stores have received comparatively less focus.
- Expanding definition of data: Gen AI has played a crucial role in broadening the understanding and scope of data.
- Increased focus on AI processes: There’s heightened interest in feature stores and model management, particularly in managing LLMs and other AI models.
- Data sharing standards: The emergence of standards for data sharing, driven by competition between Databricks and Snowflake, aligns with previous discussions on data products.
- Vendors focused on metrics stores: While there are vendors specializing in metrics stores, this area has not garnered as much attention in terms of governance and integration as other aspects of data analytics.
- Potential for greater inclusion in catalogs: We expect broader inclusion of AI-related processes in data catalogs, which has not been fully realized, leading to a more cautious assessment of progress in this area.
The impact of gen AI on the data landscape is evident, with notable progress in AI-related processes and data sharing standards. However, areas like metrics stores and the comprehensive integration of AI processes into data catalogs have not received as much attention or development as anticipated. This mixed progress highlights the dynamic nature of the data industry, where certain trends gain prominence while others await further exploration and investment.
Dashboarding gets commoditized, embedding and automation rise in 2023
Last but not least for the 2023 look-back, Doug Henschen forecast last year that BI analytics reporting and dashboarding would be commoditized; and that embedding and automation would ascend. He shared some examples below of evidence for his all green evaluation.
Our summary of Henschen’s rationale is below.
Summary of embedded BI and analytics trends (2023 look-back)
In 2023, the trend of embedded BI and analytics continued its upward trajectory, aligning with previous predictions. This year’s progress focused on integrating insights directly into decision-making processes within applications, rather than relying on separate reports and dashboards.
- Integration at decision points: A significant shift toward embedding insights directly at decision points within applications, moving away from separate analytical tools.
- Development tools expansion: Increased availability of software development kits and granular application programming interfaces, facilitating the integration of analytics into apps.
- Enhancements in GitHub and CI/CD: Improved integration with GitHub and continuous ontegration/continuous deployment capabilities, alongside the rise of low-code and no-code development options.
- Workflow automation and event architecture: BI and analytics vendors have started incorporating workflow automation using event architecture, enabling actions to be triggered directly within applications.
- Enterprise apps incorporating insights: Major enterprise application vendors such as Oracle, SAP, Salesforce Inc. and Workday Inc. have increasingly embedded insights within their platforms.
- Late 2023 developments: The announcement of tools such as Microsoft Co-pilot in Teams, Power BI’s natural language query, Tableau’s Pulse and Amazon Q, though many are still in preview stages, indicates a trajectory for further advancements in 2024.
Embedded BI and analytics have seen significant strides in 2023, with a clear focus on making data-driven insights more accessible and actionable within the workflow of enterprise applications. The development of new tools and the integration of analytics into widely used enterprise platforms suggest a continuing trend toward more seamless, efficient and user-friendly data analysis methods in the business environment. This trend is poised to evolve further in 2024, with upcoming advancements in natural language processing and AI integration.
2024 data analysts’ predictions
Keeping the same analyst order, the designated analyst presents his prediction and we made time to have one or two other analysts chime in on the prediction.
Below we show a table of all the predictions for 2024. All of them have AI included, but the forecasts span new data platforms, governance, metadata, database, skills gaps and more, so let’s get into it.
The rise of an intelligent data platform
Sanjeev Mohan predicts the emergence of a new data platform, the convergence of governance for AI and data and open-source LLMs catching up to proprietary foundation models. On Breaking Analysis, we’ve been talking about a next data platform beyond the so-called modern data platforms of Snowflake, Databricks, Google, AWS, Microsoft — and let’s include Oracle in that mix, as it is the king of databases.
Sanjeev had a lot to cover and we summarize his prediction below with a reaction from Doug Henschen and Dave Menninger.
Summary of intelligent data platform prediction (2024 prediction)
Mohan’s primary prediction for 2024 is the emergence and adoption of the “intelligent data platform,” which represents a significant advancement in integrating AI into existing data stacks. This concept focuses on minimizing data movement and integrating various components, including AI models and analytical engines, into a unified platform and aligns with theCUBE Research work around the so-called sixth data platform.
- AI integration into data stack: AI is being directly integrated into the existing data stack, reducing the need for separate AI-specific data movement.
- Infrastructure and storage layers: The platform includes a cross-cloud infrastructure layer and a unified storage layer, with storage and compute being separated.
- Analytical engines and AI models: Inclusion of various analytical engines (e.g., Spark, SQL) and AI models (open-source and proprietary), such as from OpenAI or Hugging Face.
- Data products and AI agents: Continuation of data products and BI dashboards, supplemented by AI agents capable of orchestrating tasks.
- AI and data governance convergence: A shift toward AI governance, building upon traditional data governance, including model certifications and use-case associations.
The intelligent data platform is envisioned as a comprehensive, integrated system that combines data management and AI capabilities. This platform is expected to streamline processes, enhance analytics, and provide a more cohesive governance structure for both data and AI models.
Doug Henschen and Dave Menninger responded to the prediction with a mix of appreciation for its ambition and caution about its immediate feasibility.
Key analyst insights
- Ambitious but premature: Analysts see the vision as ambitious and feel the market is far from realizing such sophistication, suitable only for the top tier of companies.
- Database vendors and gen AI: There is a trend of database vendors developing their own generative models, but the industry is still in the early stages of gen AI implementation.
- Market awareness and adoption: Analysts note a gap between the ambitious vision and the current market awareness and readiness, with many companies still unfamiliar with advanced data platforms.
- Skill set challenges: Concerns focus on the varied skill sets required for such a platform, suggesting that tooling around analytical processing might remain separate.
- Open source versus commercial models: Observations align that open-source models are being adopted at similar levels to commercial models, indicating a diverse approach in the AI landscape.
Overall, while acknowledging the innovative potential of the intelligent data platform, analysts caution against overestimating current market readiness and emphasize the gradual nature of such a significant technological shift.
Gen AI simplifies database design, deployment and operations
Tony Baer predicts that gen AI will make things simpler for database practitioners.
We asked him to explain how so and we summarize his response below.
Summary of gen AI and machine learning in database operations (2024 prediction)
Tony’s prediction for 2024 focuses on the deeper integration of generative AI and machine learning into database operations, transforming how databases are managed and interacted with. This integration is expected to bring more subtle, yet incremental improvements rather than drastic changes, enhancing automation and efficiency in database design and management.
- Invisible automation in databases: Gen AI and machine learning will become more embedded in databases, leading to automation improvements.
- Incremental improvements: Anticipated changes include incremental improvements in database design, such as entity extraction and data modeling.
- Synthetic data generation: Use of gen AI for synthetic data generation based on existing data characteristics.
- Initial steps in code generation: The start of using gen AI for data transformation pipeline creation, with more complex implementations expected in the future.
- Application in governance: Gen AI will be applied to database management governance, enhancing metadata discovery and documentation.
Gen AI and machine learning are poised to further permeate database operations, offering more sophisticated, automated and efficient ways to handle complex data tasks. This trend represents a shift toward simplifying interactions with complex data systems through intelligent technology.
Carl Olofson’s response added the following to the prediction. He affirms its potential while emphasizing its synergistic relationship with broader data management trends.
Key analyst insights
- Complexity simplification: The integration of gen AI in databases is seen as a way to simplify complex enterprise data interactions.
- Enhancing data platforms: These advancements are viewed as critical steps toward building intelligent data platforms, enabling more precise and efficient data management.
- Human-machine synergy: Carl highlights the benefits of gen AI in overcoming human limitations in data projects, such as fatigue and boredom, suggesting a more seamless and continuous data management process.
Olofson generally agrees with the prediction, seeing it as a realistic and practical evolution of database management that aligns well with the broader movement toward intelligent and automated data platforms.
Data unification catalyzes rationalization – focus on data security and governance
Carl Olofson’s 2024 prediction is shown below. He predicts that gen AI and other developments will catalyze a rationalization of data silos to enable combinatorial data use cases which will ultimately create governance challenges. So while some may see this as obvious, we asked if Carl is predicting organizations will be able to succeed in 2024, or will this governance challenge create insurmountable barriers to positive outcomes?
Olofson provided the following additional color to his prediction and we summarize below as follows:
Summary of data organization and generative AI challenges (2024 prediction)
Carl’s prediction for 2024 delves into the complexities of data organization within enterprises, particularly in the context of generative AI. It emphasizes the current disarray in enterprise data ecosystems and the potential challenges that will arise as generative AI begins to combine data in unprecedented ways.
- Current state of data disorganization: Enterprises face a chaotic data environment, with data being created and used in a fragmented manner across various applications.
- Generative AI inducing complexity: The introduction of generative AI is expected to combine data in novel, sometimes irrational ways, leading to unforeseen challenges.
- Legacy data concerns: Special attention is needed for legacy data, especially in terms of confidentiality and rationalization.
- Long-term rationalization process: Rationalizing data to fully leverage generative AI is seen as a lengthy, potentially decade-long endeavor, involving significant human effort.
The integration of generative AI in enterprise data systems is not a simple add-on; it requires a fundamental reassessment and restructuring of how data is organized and managed. This process is expected to be complex and time-consuming, necessitating careful planning and execution.
Tony Baer and Doug Henschen react to this prediction by acknowledging the inherent complexities and echo the concerns about integrating generative AI into chaotic data environments.
Key analyst insights
- Complexity and data lineage: The analysts concur that generative AI will lead enterprises into more complex data scenarios, emphasizing the importance of data lineage to understand data provenance.
- Heterogeneity in enterprises: Recognizing the diversity of enterprise data environments, analysts agree that no single data platform can uniformly address all needs.
- Data environment adaptability: The future data environment should be adaptable, able to handle varying contexts and truths, reflecting the dynamic nature of enterprise data.
The analysts generally agree with the prediction, highlighting the challenges in harmonizing heterogeneous data environments and the need for adaptable, multifaceted data management approaches.
Gen AI doesn’t replace traditional AI in demanding use cases
Dave Menninger predicts that despite all the hype around gen AI, it won’t replace traditional AI in the most demanding use cases. He predicts a continued AI skills gap. This is another prediction that feels like a lock, so we asked Dave to add some data points to increase the degree of difficulty for this call.
We summarize his response as follows:
Summary of gen AI limitations and opportunities (2024 prediction)
Dave’s prediction for 2024 highlights the limitations of gen AI in demanding use cases, despite its rapid advancements and growing popularity. It emphasizes the need for a balanced approach in adopting gen AI, acknowledging its strengths in certain areas while recognizing its current limitations in more complex, specialized fields.
- Generative AI’s impact varies by use case: GenA I shows promise in areas like document summarization and natural language assistance but falls short in more demanding fields such as banking.
- Need for advanced skills in traditional AI: Despite the ease gen AI brings to some areas, developing predictive AI models for complex tasks still requires specialized skills and knowledge.
- Skill shortage in AI development: A significant skill gap exists in AI model development, with many organizations lacking the necessary expertise.
- Cautious adoption advised: It’s recommended not to overly rely on gen AI for critical and specialized applications, given its current limitations.
Though generative AI presents exciting advancements, it’s important to recognize its limitations and not overly depend on it for complex and critical tasks. A balanced approach, valuing both gen AI and traditional AI skills, is crucial for effective and responsible AI adoption.
Sanjeev Mohan and Carl Olofson responded with analysis that underscores the limitations of gen AI and advocates for a balance between technological advancement and skilled human intervention.
Key analyst insights
- Historical perspective on AI development: Comparisons to the early stages of the worldwide web are instructive, suggesting that gen AI is at a similar nascent stage and could evolve significantly in the coming years.
- Opportunities for skilled professionals: Analysts agree that gen AI, while automating many tasks, will create opportunities for highly trained individuals to guide its development and application.
- Importance of human oversight: The necessity of human intervention and expertise in managing complex situations where gen AI may fall short is highlighted.
The analysts concur that though gen AI is a significant development, it is not a panacea for all challenges. Skilled human oversight remains essential, especially in sophisticated and nuanced applications.
Gen AI has a meaningful impact on the BI and predictive analytics value chain
The last prediction comes from Doug Henschen, who is predicting that gen AI will have a material impact on how organizations approach BI and predictive analytics. Doug’s prediction has big implications for data analysts, data pros working in the pipeline, Tableau jocks and business end-users. We wanted to know from Doug if he’s predicting that we’ll see a measurable transformation before the end of the year.
Henschen provided additional details about this prediction and we summarize his analysis below.
Summary of embedding insights and natural language query in BI analytics (2024 prediction)
The prediction for 2024 is a continuation of the 2023 trend, focusing on the increasing integration of insights and natural language queries within BI analytics. This trend is particularly characterized by the augmentation of natural language query capabilities, fueled by advancements in generative AI.
- Increased embedding of insights: There’s a growing trend of embedding insights directly where people work, moving away from traditional BI platforms.
- Enhancement of natural language query: Generative AI is expected to significantly improve the accuracy, verbosity and interpretative abilities of natural language queries.
- General availability of gen AI tools: After a year of announcements and previews in 2023, 2024 is anticipated to see the general availability of various gen AI-enhanced tools.
- Changing role of analysts: Analysts are expected to focus more on curating data, questions and prompts, guiding mainstream business users in their interactions with these advanced tools.
The integration of generative AI into BI analytics is poised to revolutionize how insights are accessed and interacted with, enhancing the natural language querying experience and embedding these capabilities across various applications.
Dave Menninger, Tony Baer and Sanjeev Mohan had reactions that underscore the potential of this trend in democratizing access to analytics and shifting the role of analysts.
Key analyst insights
- Accessibility of analytics to wider workforce: The analysts see GenAI as a key to expanding analytics access to a larger portion of the workforce.
- Importance of curating data and questions: The role of data analysts in this new environment will pivot to curating data and formulating the right questions.
- Evolution of prompt engineering: Drawing parallels to the early days of internet search, the analysts foresee prompt engineering becoming more intuitive and less complex over time.
The analysts generally agree with the prediction, highlighting the transformative potential of generative AI in making analytics more accessible and shifting the focus of analysts toward more nuanced aspects of data interaction.
What do you think of our 2023 data predictions, the assessment of their accuracy by the analysts and the relevance of our 2024 predictions? Let us know and thanks for reading!
Keep in touch
Thanks to Alex Myerson and Ken Shifman on production, podcasts and media workflows for Breaking Analysis. Special thanks to Kristen Martin and Cheryl Knight ,who help us keep our community informed and get the word out, and to Rob Hof, our editor in chief at SiliconANGLE.
Also, check out this ETR Tutorial we created, which explains the spending methodology in more detail. Note: ETR is a separate company from theCUBE Research and SiliconANGLE. If you would like to cite or republish any of the company’s data, or inquire about its services, please contact ETR at firstname.lastname@example.org or email@example.com.
Here’s the full video analysis:
All statements made regarding companies or securities are strictly beliefs, points of view and opinions held by SiliconANGLE Media, Enterprise Technology Research, other guests on theCUBE and guest writers. Such statements are not recommendations by these individuals to buy, sell or hold any security. The content presented does not constitute investment advice and should not be used as the basis for any investment decision. You and only you are responsible for your investment decisions.
Disclosure: Many of the companies cited in Breaking Analysis are sponsors of theCUBE and/or clients of Wikibon. None of these firms or other companies have any editorial control over or advanced viewing of what’s published in Breaking Analysis.
Image: theCUBE Research
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.