UPDATED 12:00 EDT / JUNE 14 2022

BIG DATA

Snowflake bids for transactional and data science workloads in broad set of enhancements

Snowflake Inc. is leading off its Snowflake Summit 2022 conference today with a brace of announcements aimed at making its platform more programmable, flexible and accommodating of a greater variety of workloads, including transactions.

The company, which has been chastened by slowing growth and a nearly 66% decline in its stock price since the beginning of the year, is attempting to expand the scope of workloads it supports while bringing data scientists more closely into the fold.

The company is launching Unistore, a new type of workload that enables users to combine transactional and analytical data in a single platform with consistent governance and large scale. Unistore enables Snowflake Data Cloud users to include transactional uses such as application state and data serving into their analytics workflows, bridging the silos that have traditionally separated those two environments.

“We have a broad North Star for Snowflake, which is to provide a platform that can help customers organize and reason through all types of data,” Christian Kleinerman, senior vice president of product at Snowflake, said in a briefing. “With your transactional and analytical data all inside the data cloud, you can now start to perform analytical queries on top of that data and discover insights that were never previously possible because it’s on both your transactional and analytical data at the same time.” The upshot is a simpler environment with unified security and management within a single engine, he said.

As part of that broadening of its cloud data warehouse, Snowflake is also introducing Hybrid Tables for fast single-row operations that customers can use to build transactional applications. This enables users to perform analytics directly on transactional data as well as join Hybrid Tables with existing Snowflake Tables for a more complete view, Kleinerman said.

Courting data scientists

The addition of Python support enhances accessibility and programmability for data scientists, data engineers and developers. Snowpark for Python, released in public preview, is natively integrated with a framework for creating data-focused applications that Snowflake recently acquired with its purchase of Streamlit Inc. Developers can build scalable pipelines, applications and applications and machine learning workflows directly in Snowflake using their preferred languages and libraries. The company said it’s also broadening data access with the new enhancements for working with streaming data.

Streaming data support, which is currently in private preview, allows for serverless ingestion and declarative transformation of streaming data. “Snowpipe Streaming will deliver an order-of-magnitude improvement in latency from what have today, from minutes to single-digit seconds,” Kleinerman said.

A new option called Materialized Tables supports the full set of Snowflake queries on declarative pipeline definitions with the automatic definition of parameters for incremental pipeline maintenance. “With this, we believe we complete the set of options for customers in how you bring data into Snowflake and how you transform it with ease-of-use and maximum expressive power,” Kleinerman said.

Snowflake also said it also plans to expand its external tables, which store data in files in an external stage to support Apache Iceberg — an open, high-performance format for very large analytic tables — to allow users to access data in on-premises storage systems directly from the cloud.

Instead of having to deal with a collection of isolated files in different formats, Iceberg unifies that data in one place, Kleinerman said. “This will introduce a first-class table type that has all the characteristics of Snowflake’s traditional tables with very high performance,” he said. “User will be a will choose on a table-by-table basis what will appear in open formats as opposed to the Snowflake format, which still has some performance advantages.”

Broadened replication

In the area of replication, Snowflake is enhancing its existing cross-region and cross-cloud replication with the ability to implement failover and failback with a client redirect. That eliminates the need for extract/transform/load dashboards to reconnect to a source database. It’s also expanding data collection to include additional information such as users, workloads and timestamps. Pipeline replication allows customers to implement failover without data duplication.

Additional updates give developers the ability to build applications and models in Snowflake’s Snowsight web interface using Python. They can securely execute memory-intensive operations such as feature engineering model training on large data sets using Python libraries and embed machine learning-power predictions into their business intelligence and analytics applications. A new geometry data type extends the data platform’s planar coordinate system.

“We’ve had a geography data type which is a round-earth coordinate system; this provides a flat coordinate system,” Kleinerman said. Queries on geospatial data have also been sped up by a factor of five, he said.

Consolidated security data

A new Cybersecurity Workload combines the Snowflake platform and software from ecosystem partners to enable cybersecurity teams to natively handle structured, semi-structured and unstructured log data. That means they can store years’ worth of high-volume data and search it with scalable resources using languages like SQL and Python. Security data can also be combined with application data to enable more informed investigations and unified visibility across security vectors.

Finally, a new Native Application Framework, which is also in private preview, is aimed at commercial developers. It enables them to deploy applications to the Snowflake Marketplace that customers can securely install and run directly in their Snowflake instances without moving data.

Developers can build applications using such tools as stored procedures, user-defined functions and user-defined table functions as well as use Streamlit to build interfaces. Telemetry features such as events and alerts for monitoring and troubleshooting are also under development, the company said.

“Getting apps into the hands of your customers has always been hard,” said Snowflake Product Lead Chris Child. “It requires organizations to manage application infrastructure as well as handle potentially sensitive information on behalf of their customers while also figuring out their distribution strategies and billing.”

The goal of the framework “is to allow you to stop bringing your data to your applications and creating new silos and instead bring the applications to your data entirely inside the data cloud,” Child said. “This hasn’t really been possible before because there wasn’t a single data platform that could handle all of your different types of data and also the compute and sharing that is necessary to run your different types of applications in a single place.”

Customers can feel safe giving applications access to the most sensitive data because developers never see the data, Child said.

Responding to customer requests for better visibility over their usage, Snowflake also said it’s adding threshold alerts and resource groups that enable users to combine resources and budgets for spending analysis.

In a separate announcement, ALTR Solutions Inc., a maker of data control and protection software, released a new policy automation engine for managing data access controls in Snowflake and other environments. It allows data engineers and architects to set up data access policies in minutes, manage ongoing updates to data permissions and handle data access requests through ALTR’s no-code platform for data policy management.

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU