Snowflake adds bevy of features for AI development and managed Polaris catalog
Snowflake Inc. today announced at its annual Build 2024 virtual developer conference numerous enhancements to its cloud data platform, many focused on artificial intelligence.
Among them is a natural-language front-end to internal data with agentic capabilities, tools that help developers more quickly build conversational front-ends for managing and accessing structured and unstructured data, enhancements that run batch large language model inferencing more efficiently and the ability to train custom models with graphic processing unit-powered containers within Snowflake Cortex AI, a managed AI development service.
Leading off the list of AI announcements is Snowflake Intelligence, a new platform soon entering private preview that’s meant to enable business users to ask questions about their organization’s data in natural language and to create data agents that take action on the results.
Snowflake Intelligence connects with third-party tools — including internal databases, Microsoft SharePoint document repositories, Salesforce Inc.’s customers relationship management and Slack collaboration application and Google LLC’s Workspace – to combine with business intelligence data in Snowflake.
The company said the toolset addresses fragmented governance across data sources, silos of unstructured and structured data and the shortage of analysts to write code to enable unified access, by replacing it with a single governance layer that accesses both unstructured and structured data sources without the need for custom coding. Data agents analyze and summarize data and to generate new tasks. They can also use application programming interfaces to read and write to Snowflake tables.
Snowflake Intelligence is based on the company’s Cortex AI fully managed artificial intelligence service that contains a suite of generative AI features. It also uses the Cortex Search fully managed search engine to run queries on unstructured data and Cortex Analyst to query structured data.
It’s natively integrated with Snowflake Horizon Catalog, making it compatible with open table formats such as Apache Iceberg and the Apache Polaris catalog. The combination delivers high levels of compliance, security, privacy, discovery and collaboration capabilities, Snowflake said.
Managed catalog
Snowflake is also releasing a managed version of the Apache Polaris catalog, which it introduced and released to open source in June. Snowflake Open Catalog, which is now generally available, allows users to integrate various engines and apply consistent governance controls across multiple table formats such as Apache Iceberg and Apache Hudi.
The open-source catalog is a break from Snowflake’s proprietary history. It reflects customer demand for greater choice in how they manage the large repositories called data lakes that undergird AI development. “Anyone can host Apache Polaris, but the hosted version for customers that want us to deliver a managed service is called Snowflake Open Catalog,” said Christian Kleinerman, Snowflake’s executive vice president of product.
Document AI now available
The second major announcement at today’s conference is the general availability of the Document AI data extraction feature on the Amazon Web Services Inc. and Microsoft Corp. Azure cloud platforms. Document AI leverages Snowflake’s Arctic-TILT large language model to extract and summarize information from text-heavy documents and interpret unstructured elements such as logos, handwritten text and form fills.
A key distinction of Document AI is its self-learning capability, Kleinerman said. “Customers can give feedback on answers and ask Document AI to retrain or fine tune the model and continue to improve based on their feedback,” he said. “Over time the model understands the use case for a given customer better and better and is exclusively trained with customer data.”
Business analysts and data engineers can now preprocess data in PDFs and other documents for AI training using short SQL functions for layout-aware document text extraction and text chunking functions in Cortex Search. Both features are now in public preview.
Unified data platform
The company is also unveiling a new approach to bringing transactional and analytical data together in a single platform called Unistore. It’s powered by Hybrid Tables, a format that supports fast single-row operations for transactional applications. Unistore simplifies data architectures while ensuring consistent security and governance, relieving organizations of the need to manage separate transactional and analytical databases.
Hybrid Tables intelligently identifies whether a query is transactional or analytical and optimizes query performance accordingly. Users can maintain application and workflow states in real time without needing to manage multiple database systems or moving between databases. This enables them to build lightweight transactional applications with Snowflake’s expanded support for transactional capabilities.
Leaked password monitoring
Snowflake has been under pressure to strengthen security since some of its customers were targeted by attackers last spring, although it said at the time that its own defenses weren’t compromised but that attackers had targeted customers that weren’t using multifactor authentication.
Security enhancements being rolled out today include a feature in the managed Horizon Catalog that monitors the dark web and other known attacker hangouts for stolen credentials. “If we see that those match credentials that customers have in Snowflake, we will alert and potentially go all the way to disabling accounts to avoid some of the attacks we saw earlier in the year,” Kleinerman said.
Enhancements to the Snowflake Trust Center include a new Threat Intelligence Scanner Package, now now generally available, that automatically detects which users — whether human or service — are risky and recommends ways to reduce risks. Snowflake is also extending its Trust Center security framework to allow third parties to extend existing security features and sell them as Snowflake Native Applications on the Snowflake Marketplace. The feature will go into private preview soon.
Support for Programmatic Access Tokens for API authentication is being added in Horizon Catalog to simplify application access while enhancing security with scope and expiration for tokens.
Better chat
Conversational applications are getting support for multimodal inputs with images coming first followed by audio and other data types using multimodal LLMs. Internal knowledge bases can be integrated using managed connectors such as the new Snowflake Connector for SharePoint, which is now in public preview, to automatically ingest files without to manually preprocess documents.
The Cortex Chat API is being enhanced to streamline integration between the application front-end and Snowflake. Cortex Chat API combines structured and unstructured data into a single representation state transfer call for use in retrieval-augmented generation and agentic analytics.
New Cortex Knowledge Extensions on Snowflake Marketplace support chat applications using unstructured data from third party content providers with isolation and attribution constructs that are meant to respect publishers’ intellectual property.
With AI Observability for LLM Applications, which is in private preview, users can evaluate and monitor their generative AI applications with more than 20 metrics for relevance, groundedness (the alignment of generated responses with factual, relevant and contextually accurate information), stereotype and latency during development and in production.
Improvements to Cortex Analyst include simplified data analysis with advanced joins and multi-turn conversations and more dynamic retrieval with Cortex Search integration. Multi-turn conversations allow interactions between a chatbot and a user to span multiple exchanges without losing context. The features are in public preview.
Faster AI pipelines
New customization options for large batch text processing support the construction of natural language processing pipelines at large scale. Snowflake is also adding a broader selection of pretrained LLMs, embedding model sizes, context window lengths and supported languages to Cortex AI. They include adding the multilingual embedding model from Voyage AI Innovations Inc., Meta Platforms Inc.’s multimodal Llama 3.1 and 3.2 models, and AI21 Labs Ltd.’s Jamba huge context window models for serverless inferencing.
A new sandbox feature called Cortex Playground, which is now in public preview, provides an integrated chat interface where users can generate and compare responses from different LLMs.
The new Cortex Serverless Fine-Tuning feature allows developers to customize models with proprietary data to generate results with more accurate outputs. Provisioned Throughput, which enters public preview soon, processes large inference jobs with guaranteed throughput.
Snowflake ML, an integrated set of capabilities for machine learning development and inferencing, now supports Container Runtime in a public preview on AWS and in a forthcoming public preview on Azure. This enables more efficient execution of distributed machine learning training jobs on GPUs using any Python framework or language model.
Model Serving in Containers, a feature entering public preview on AWS, enables teams to deploy both internally and externally trained models from the Snowflake Model Registry into Snowpark Container Services using distributed CPUs or GPUs. Snowpark Container Services is a managed offering that enables users to deploy, manage and scale containerized applications directly within the Snowflake ecosystem.
New Storage Lifecycle Policies, now in private preview, reduce storage costs and enhance compliance by introducing new ways to archive or delete data. Snowflake is also enhancing support for data migration from relational database management systems by adding additional views support to its SnowConvert native code conversion tooling.
Simpler sharing
Snowflake’s Internal Marketplace, which is now generally available, enables users to discover data, applications and AI products from other teams and business units within their organizations while preventing unintended sharing with external parties.
The Internal Marketplace also allows users to share fine-tuned large language models to make it easier for them to collaborate on generative AI use cases for use case-specific tasks. The function, which is now in public preview, works securely from within the AI Data Cloud, eliminating the need to make copies of data or transfer it between accounts.
A new Copilot for Listings feature in private preview allows data products listed on an organization’s Internal Marketplace to be easily evaluated using natural language. The AI assistant generates and executes high-quality SQL commands and answers questions that help users quickly determine whether shared data is relevant to their work.
Snowflake Native Application Framework Integration with Snowpark Container Services, now generally available on AWS and in public preview on Azure, allows users to easily build applications in their preferred programming language with customizable user experiences and deploy them on top of configurable GPU and CPU instances. Published applications can be distributed across clouds and regions with observability and security across the development process.
The Snowflake Native Application Framework is also adding support for the Snowpark ML Modeling API, which uses Python frameworks such as scikit-learn, LightGBM, and XGBoost for preprocessing data, feature engineering and training models inside Snowflake. New Secure Model Sharing capabilities now in public preview allow model developers to use the Snowpark ML Modeling API to create and train models, store them in model registries within their accounts, and securely distribute and make money from them on the Snowflake Marketplace.
Image: theCUBE Research/DALL-E 3
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU