UPDATED 15:10 EDT / NOVEMBER 30 2022

CLOUD

AWS rolls out new database and analytics tools for data management at scale

Amazon Web Services Inc. announced five new capabilities across its database and analytics products today during AWS re:Invent designed to provide customers the tools needed to help them manage and analyze data at petabyte scale.

Amazon announced capabilities coming to DocumentDB, OpenSearch Service and interactive query service Athena that will enhance high-performance database analytics at scale. In addition, data integration processor AWS Glue has been updated to automatically manage data quality at scale. Redshift, a managed data warehouse product, has also been updated to support high availability configurations across multiple AWS availability zones.

“Data is inherently dynamic, and harnessing it to its full potential requires an end-to-end data strategy that can scale with a customer’s needs and accommodate all types of use cases — both now and in the future,” said Swami Sivasubramanian (pictured), vice president of databases, analytics and machine learning at AWS. “To help customers make the most of their growing volume and variety of data, we are committed to offering the broadest and deepest set of database and analytics services.”

Sivasubramanian voiced Amazon’s commitment to building resources that customers could use to manage and query data at scale in order to make better decisions with their data.

According to Amazon, customers today are facing ever-increasing data needs as they create and store petabytes, and even exabytes, of data from numerous sources. The tools to access, query and analyze that data have become even more complex from integrating data, to storing it and finally to make it available to generate insights.

Amazon DocumentDB has released a new type of cluster that allows customers to elastically scale their document databases. The new capability, known as DocumentDB Elastic Clusters, allows customers to scale document databases to handle millions of reads and writes per second and store 2 petabytes of data within minutes, a capacity beyond a single node. Previously, customers needed to write specialized code to spread workloads across multiple nodes when workloads became demanding, now this feature is inherent in the Elastic Clusters and managed automatically for customers who need this capability.

With the release of Amazon OpenSearch Serverless, customers can have search indexes automatically provision, configure and scale allowing for petabyte-scale search. It does this by decoupling indexing of information from search, allowing it to rapidly scale without any performance hit during massive spikes in workload on either side. Customers of OpenSearch Serverless get the benefits of scalability along with standard features such as built-in data visualization for understanding log data and search relevance rankings.

Amazon Athena now supports Apache Spark, an open-source processing framework for big data workloads, that upgrades its ability to provide interactive queries as one of the fastest ways to search petabytes of data across Amazon Simple Storage Service. The addition of Apache Spark should help developers write applications in the languages they prefer, such as Java, Scala, Python and R, without needing to set up, manage and scale their own Apache Spark instance every time they want to run a query. With support for Apache Spark on AWS, customers can now run queries, complex analyses and quickly visualize results.

AWS Glue, a serverless, scalable data integration service that makes it possible to integrate and manage data from multiple sources, is getting a preview of AWS Glue Data Quality. This feature automatically analyzes data and gathers statistics, then recommends data quality rules to get customers started. Customers can set their own rules. If the data quality being ingested falls below certain thresholds, the customer will be alerted and take action.

Finally, Amazon Redshift, AWS’ large-scale fully managed data warehouse service, announced support for the deployment of Redshift across multiple availability zones with Redshift Multi-AZ. Redshift already actively increases availability and reliability by automatically backing up clusters in case of critical failures and allows workloads to relocate to other clusters without applications noticing. However, with multi-AZ, clusters are deployed across multiple availability zones simultaneously and are still managed as a single data warehouse with one endpoint. As such, if a zone has a failure, live data can be shifted quickly to another zone.

“The new capabilities announced today build on this by making it even easier for customers to query, manage, and scale their data to make faster, data-driven decisions,” said Sivasubramanian.

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+  
CUBE Alumni Network
C-level and Technical
Domain Experts
15M+ 
theCUBE
Viewers
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.