UPDATED 16:53 EST / MARCH 25 2021

CLOUD

Autonomy gives data control to the user as competition heats up in the cloud data warehouse market

In today’s business environment, survival depends on competing against data-driven digital natives whose every move is made based on a barrage of insights gained from intelligent analysis of customer data. Bulky, expensive on-premises data warehouses with siloed data and static data lakes just don’t work for companies wanting fast, simple access to their data resources.

Powered by the demand for real-time business insights, an evolutionary leap is bringing intelligence into the data warehouse. As “self-driving” databases enter the market, theCUBE, SiliconANGLE Media’s livestreaming studio, takes a look at the major players in the cloud data warehouse market and how automation is opening a new arena for the cloud wars.

Data warehousing joins the cloud revolution

Traditional data warehouses became the dominant architectural choice for business intelligence in the era of big data. But their relevance started to crumble as cloud computing brought with it a mandate for agility and flexibility. As the hybrid model gained in popularity, the need for cloud data warehousing grew stronger. With no requirements for upfront investment in physical hardware, entry costs were low, set-up was fast, and queries were returned much faster thanks to the introduction of massively parallel processing, or MPP, column stores and other cloud scaling techniques.

In October 2020, market research company Gartner Inc. created a new Magic Quadrant for cloud database management systems, predicting that by 2022, 75% of all databases will be deployed or migrated to a cloud platform and by 2023, cloud revenue will account for 50% of the total market revenue. The current global market has a compound annual growth rate of 16.4% and is forecast to reach $3.5 billion by 2025.

“Everybody’s [data warehousing] business is going to go to the cloud,” said Gartner analyst Adam Ronthal.

Unlike other areas of cloud, where upstart startups have cornered a new market niche, the major data warehouse players stayed in the cloud data warehouse game. Top of the leaders’ quadrant are the usual cloud suspects Amazon Web Services Inc. and Microsoft, with Google (a.k.a. Alphabet Inc.) and Oracle Corp. jostling for position in their ability to execute and completeness of vision. In the challengers’ corner are industry disrupter Snowflake Inc. and open-source database management unicorn Redis Labs Inc.

Amazon offers variety, but choice can lead to complexity

The leader of the cloud data warehouse pack is Amazon Redshift. In keeping with AWS’ “right tool for the right job” philosophy, Redshift is supported by a slew of data collection, preparation and storage services that include Amazon DynamoDB, Amazon EMR, Amazon Kinesis Firehose and Amazon Simple Storage Service (Amazon S3).

Rather than start over, AWS based Redshift on ParAccel, an existing traditional data warehouse solution that the company licensed from Actian Corp. in 2011. This decision allowed a faster time-to-market, and when Redshift was launched in 2013, it was the first cloud data warehousing platform out there. But the platform wasn’t originally architected for the cloud, and while it is a fully managed solution, it still requires a lot of hands-on effort, especially for larger organizations.

“There are a lot of constraints to running large systems on AWS,” David Floyer, chief technology officer of SiliconANGLE’s sister company Wikibon Inc., stated in an analysis of the cloud data warehouse market. Noting Amazon’s lack of a true tier-one database to handle mission-critical systems, Floyer said that AWS would solve the problem by encouraging customers to migrate data to the Aurora relational database, which was built for the cloud. But that means breaking down applications and stitching them back together with microservices, leaving the user responsible for testing and maintenance overhead.

“In my opinion, AWS has to invest enormously to make the whole ecosystem much better,” Floyer said.

Redshift is not the only cloud data warehouse to demand high administrator oversight. Microsoft’s Azure Synapse Analytics was also originally based on a non-cloud solution and has inherited complexities. In fact, out of all the competitors in the market, Snowflake Inc. and Google BigQuery are two of the most prominent, true born-in-the-cloud data warehouse solutions — that is, built on architectures that take advantage of the native benefits of public cloud; such offerings are massively scalable and governed for shared access.

“We burned the ship behind us,” Snowflake Chairman and Chief Executive Officer Frank Slootman told theCUBE’s Dave Vellante in a recent CUBE Conversation. “People will come to the public cloud a lot sooner than we will ever come to the private cloud.”

“[Snowflake’s] trying to create this data cloud notion to facilitate data sharing, put data in the hands of business owners, and provide better access to data product builders,” Vellante said. However, “trying” is still the key word for Snowflake. Although revenue is climbing, the company is still trying to prove it can manage its finances and operate at a profit.

Putting Snowflake’s potential aside, statistics show that the leading cloud data warehouse platforms are far from cohesively linked or easy to manage. On average, database administrator costs are estimated to account for 40% of the total cost of operations. Complex and time-consuming tasks include performance tuning, loading data via ETL, cluster optimization, and setting up and monitoring back-up and recovery. As the amount of data generated increases, the complexity of managing database systems will fast exceed human capabilities.

Automation democratizes cloud data warehousing

It doesn’t take much of a leap to ask: What would it take to make these onerous tasks go away? Following the pattern happening across the technology industry, data warehousing needs to abstract away the complexity and make querying data a simple task that can be done by “citizen data scientists.”

Advances in machine learning and artificial intelligence, alongside hardware improvements, have made the autonomous data warehouse an achievable goal. The database research group at Carnegie Mellon University sponsors the open-source project NoisePage, a self-driving database that has the goal to “continue to operate on its own long after you are dead.”

“We define a self-driving database as a database that can configure, tune and optimize all system aspects without any human intervention,” said project contributor and doctoral candidate Lin Ma in an online lecture describing how NoisePage has been developed.

The analogy to autonomous vehicles is based on the sequence of actions required to achieve the desired result: perception, action modeling and planning. For the car, this involves input from sensors that allow it to anticipate events and predict potential actions to take. Modeling these actions, the vehicle can plan a sequence of actions to take it on the safest and most efficient course. For a self-driving database, the sequence involves workload forecasting, behavior modeling and action planning.

“In this data-driven era, not only can the database system store more data, but it can collect more metrics and more stats about the system itself … it can process data much faster and also retrieve data faster and perform calculations faster,” Ma stated.

Hands-off, humans! True autonomy takes control

Oracle describes its cloud data warehouse platform, the Oracle Autonomous Data Warehouse, as “the industry’s first self-driving database.” Released in March 2018, it is a data-warehouse-as-a-service solution that leverages machine learning to be not only self-driving, but also self-securing and self-repairing.

“[Oracle] is pushing out to the lines of business, and it’s simplifying things,” Wikibon’s Floyer stated. “Business lines can manage their own data and not rely on an IT person from headquarters to help them.”

Creating a true database-as-a-service platform means a hands-off approach to database management, where the “engine compartment” is tightly sealed to hide away what’s happening on the inside. All the database administrators and systems administrators have to do is sit back and enjoy the ride.

“An autonomous system must operate without any human assistance, including upgrades, tuning, patching, performance, security and more,” said Neil Mendelson, vice president of modern data warehouse business development at Oracle, in an exclusive interview with theCUBE. “Oracle is the only cloud vendor producing production-grade systems that are fully autonomous. Other cloud vendors are still filling critical capability functionality gaps in their service.”

Mendelson acknowledged Snowflake’s data-masking capability, but noted that the function is not truly autonomous as it requires a security specialist responsible for manually identifying and fixing security issues.

“Without a commitment to an autonomous future, Snowflake and others will continue to add missing functionality and with it additional technical debt,” Mendelson said.

Oracle ADW puts citizen data scientists in the driver’s seat

The latest release of Oracle ADW brings that commitment, adding a new level of autonomy to the platform. Its integrated tools work on a simple drag-and-drop interface, making it simple for “citizen developers” to work with data and generate insights without help from the information technology team. Most significant is the browser-based, low-code Oracle Application Express (APEX) tool, introduced in Oracle Database 21c.

This ease of use is strategically important because it shows Oracle’s focus on the capability of the end-user, according to Floyer. “It’s really important that you reach out to the developer as they are and what tools they want to use,” he said. “Don’t try to exclude other people; be a platform and be an ecosystem for the end users.”

By lowering the entry bar so that those who use business insights have direct access to the analytics tools that create them, Oracle ADW eliminates wasted admin time, provides for better customer experiences, and cuts costs dramatically. Research conducted by Wikibon found that traditional data center architectures have a 96% higher total cost of ownership than the Oracle Autonomous Data Warehouse on its Exadata Cloud@Customer X8M solution. This is mainly due to the reduction in admin costs, which Oracle customers have seen drop by up to 90%. In a real-world study conducted by International Data Corp., ADW users lowered their total cost of operations by an average of 63% after adopting the platform. Another study showed companies receiving a 417% return in investment over five years, adding up to millions of dollars in savings.

The future of cloud is autonomous

It seems that the cloud wars have taken on a new lease of life as unexpected players enter the battle. The biggest surprise in the late entries is Oracle, who many had written off as a non-starter in the race for cloud relevancy. After being named a “cloud giant” by Barron’s in February, the company’s stock shot up 18% over nine trading days and its Gen2 cloud portfolio is driving triple-digit growth.

But while its strengths in mission-critical workloads and focus on democratizing data analysis make Oracle a viable contender, big hitters AWS, Google and Microsoft aren’t going to share the cloud crown easily. And headline-grabbing Snowflake is determined to join their ranks. It’s a bet that Oracle will continue to add features and mature the self-driving capabilities of its ADW platform, but other companies are bound to counter with innovations of their own, leaving the field wide open for play.

“The concept of autonomy is not limited to data warehousing; it spans all use-cases: transaction processing, data warehousing and mixed workloads. Every day we are seeing examples of machine learning changing the way we work,” Mendelson said. “The very definition of what it means to be a cloud service will evolve to require embedded ML/AI to decrease security risks, lower labor costs, and increase productivity.”

Image: Khakimullin Aleksandr

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.