UPDATED 08:00 EDT / OCTOBER 11 2022

BIG DATA

Google touts open data cloud to unify information from every source

Google Cloud has ambitions to build what it says will be the most open, extensible and powerful data cloud of all, as part of its mission to ensure customers can make use of all of their data, from any source, no matter where it is or what format it’s in.

Google announced this “data cloud” vision today at Google Cloud Next 2022, where it introduced an avalanche of updates to its existing data services, as well as some new ones. The new updates are all designed to make this vision of an open and extensible data cloud a reality.

“Every company is a big-data company now,” Gerrit Kazmaier, vice president and general manager of Data Analytics at Google Cloud, told SiliconANGLE in an interview. “This is a call for a data ecosystem. It will be a key underpinning of the modern enterprise.”

One of the first steps in fulfilling that vision is to ensure that customers can indeed make use of all of their data. To that end, Google’s data warehouse service BigQuery has gained the ability to analyze unstructured streaming data for the first time.

BigQuery can now ingest every kind of data, regardless of its storage format or environment. Google said that’s vital because most teams today can only work with structured data from operational databases and applications such as ServiceNow, Salesforce, Workday and so on.

But unstructured data, such as video from television archives, audio from call centers and radio, paper documents and so on account for more than 90% of all information available to organizations today. This data, which was previously left gathering dust, can now be analyzed in BigQuery and used to power services such as machine learning, speech recognition, translation, text processing and data analytics via a familiar Structured Query Language interface.

It’s a big step but by far not the only one. To further its aims, Google says, it’s adding support for major data formats such as Apache Iceberg, Delta Lake and Apache Hudi in its BigLake storage engine. “By supporting these widely adopted data formats, we can help eliminate barriers that prevent organizations from getting the full value from their data,” said Kazmaier. “With BigLake, you get the ability to manage data across multiple clouds. We’ll meet you where you are.”

Meanwhile, BigQuery gets a new integration with Apache Spark that will enable data scientists to improve data processing times significantly. Datastream is being integrated with BigQuery too, in a move that will enable customers to more effectively replicate data from sources such as AlloyDB, PostgreSQL, MySQL and other third-party databases such as Oracle.

To ensure users have greater confidence in their data, Google said, it’s expanding the capabilities of its Dataplex service, giving it the ability to automate processes associated with improving data quality and lineage. “For instance, users will now be able to more easily understand data lineage — where data originates and how it has transformed and moved over time — reducing the need for manual, time-consuming processes,” Kazmaier said.

Unified business intelligence

Making data more accessible is one thing, but customers also need to be able to work with that data. To that end, Google said it will unify its portfolio of business intelligence tools under the Looker umbrella. Looker will be integrated with Data Studio and other core BI tools to simplify how people can get insights from their data.

As part of the integration, Data Studio is being rebranded as Looker Studio, helping customers to go beyond looking at dashboards by infusing their workflows and applications with ready-made intelligence to aid in data-driven decision-making, Google said. Looker will, for example, be integrated with Google Workspace, providing easier access to insights from within productivity tools such as Sheets.

In addition, Google said, it will make it simpler for customers to work with the BI tools of their choice. Looker already integrates with Tableau Software for example, and soon it will do the same with Microsoft Power BI.

Powering artificial intelligence

One of the most common use cases for data today is powering AI services — one area where Google is a clear leader. It’s not planning on letting go of that lead anytime soon, either. In an effort to make AI-based computer vision and image recognition more accessible, Google is launching a new service called Vertex AI Vision.

The service extends the capabilities of Vertex AI, providing an end-to-end application development environment for ingesting, analyzing and storing visual data. So users will be able to stream video from manufacturing plants to create AI models that can improve safety, or else take video footage from store shelves to better manage product inventory, Google said.

“Vertex AI Vision can reduce the time to create computer vision applications from weeks to hours at one-tenth the cost of current offerings,” Kazmaier explained. “To achieve these efficiencies, Vertex AI Vision provides an easy-to-use, drag-and-drop interface and a library of pre-trained ML models for common tasks such as occupancy counting, product recognition and object detection.”

For less technical users, Google is introducing more “AI agents,” which are tools that make it easy for anyone to apply AI models to common business tasks, making the technology accessible to almost anyone.

The new AI Agents include Translation Hub, which enables self-service document translation with support for an impressive 135 languages at launch. Translation Hub incorporates technologies such as Google’s Neural Machine Translation and AutoML and works by ingesting and translating content from multiple document types, including Google Docs, Word documents, Slides and PDF. Not only does it preserve the exact layout and formatting, but it also comes with granular management controls including support for post-editing human-in-the-loop feedback and document review.

Using Translation Hub, researchers would be able to share important documents with their colleagues across the world, while goods and services providers will be able to reach underserved markets. Moreover, Google said, public sector administrators can reach more community members in their native language.

A second new AI agent is Document AI Workbench, which makes it easier to build custom document parsers that can be trained to extract and summarize key information from large documents. “Document AI Workbench can remove the barriers around building custom document parsers, helping organizations extract fields of interest that are specific to their business needs,” said June Yang, vice president of cloud AI and industry solutions.

Google also introduced Document AI Warehouse, which is designed to eliminate the challenge of tagging and extracting data from documents.

Expanded integrations

Finally, Google said it’s expanding its integrations with some of the most popular enterprise data platforms to make sure information stored within them is also accessible to its customers.

Kazmaier explained that providing customers with the flexibility to work across any data platform is critical to ensure choice and prevent data lock-in. With that in mind, he said, Google is committed to working with all major enterprise data platform providers, including the likes of Collibra NV, Databricks Inc., Elastic NV, FiveTran Inc., MongoDB Inc., Reltio Inc. and Strimm Ltd., to ensure its tools work with their products.

David Meyer, senior vice president of product management at Databricks, told SiliconANGLE in an interview that the company has been working with Google for about two years on BigQuery supporting Databricks’ Delta Lake, following similar work with Amazon Web Services Inc. and Microsoft Corp.’s Azure.

“Making it so you don’t have to move the data out of your data lake reduces the cost and complexity,” Meyer said. “We see this as an inflection point.” Even so, he added, this is just the start of work with Google Cloud, and the two companies will be working on solving other challenges, such as joint governance efforts.

Kazmaier said the company is also working with the 17 members of the Data Cloud Alliance to promote open standards and interoperability in the data industry. It’s also continuing support for open-source database engines such as MongoDB, MySQL, PostgreSQL and Redis, as well as Google Cloud databases such as AlloyDB for PostgreSQL, Cloud Bigtable, Firestore and Cloud Spanner.

With reporting from Robert Hof

Images: Google

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU