UPDATED 17:38 EDT / NOVEMBER 29 2023

CLOUD

Amazon Bedrock receives new AI evaluation tool and more foundation models

Amazon Web Services Inc. is rolling out a new tool that will enable developers to compare the foundation models in its Amazon Bedrock Service more easily.

The tool was announced today at AWS re:Invent 2023. It’s rolling out to Bedrock alongside three new foundation models: Anthropic PBC’s Claude 2.1, the open-source Llama 2 model from Meta Platforms Inc. and Stability AI Ltd.’s latest Stable Diffusion XL 1.0 image generator. All three neural networks are now available via the Bedrock application programming interface.

To build an AI application, developers find several neural networks that could potentially be used in the project and compare them to find the most suitable one. Such assessments often require a significant amount of custom code. Model Evaluation on Amazon Bedrock, the new AI evaluation tool AWS debuted today, makes it possible to compare the foundation models in Bedrock with significantly less manual work.

To use the tool, developers must select one of the models in Bedrock and specify the task for which they plan to use it. In the next step of the workflow, they upload a dataset on which the specified task will be performed. For example, a software team building a customer support chatbot could upload a dataset containing help desk tickets. 

After ingesting the uploaded records, the tool automatically measures how the Bedrock model being evaluated performs across a set of predefined metrics. Those metrics include accuracy, toxicity and robustness, which is a measure of how consistently a model maintains its accuracy in different situations. 

“With automatic model evaluation, you can bring your own data or use built-in, curated datasets and pre-defined metrics for specific tasks such as content summarization, question and answering, text classification, and text generation,” Antje Barth, AWS’ principal developer advocate for generative AI, detailed in a blog post.

Some AI projects require developers to assess AI models based on metrics that AWS’ new tool doesn’t support out of the box. For such situations, the tool provides the ability to launch so-called human evaluation workflows. Those are evaluations carried out by an AWS-managed team or a company’s own employees. 

New foundation models 

Companies using Bedrock are also receiving access to three new foundation models. Two focus primarily on text processing tasks, while the third is an open-source image generator.

The first addition is Claude 2.1, the latest version of startup Anthropic’s ChatGPT rival. It’s a general-purpose language model optimized for text and code generation tasks. Compared with its predecessor, Claude 2.1 is 30% less likely to generate inaccurate answers, an improvement attributed partly to a 50% decrease in hallucination rates.

The model includes a number of other enhancements as well. Most notably, it can access external applications via their APIs and perform simple actions in those applications, as well as retrieve data. Additionally, Claude 2.1 has a significantly larger context window than its predecessors, which means users can include more information in prompts.

The model is rolling out to Bedrock alongside two open-source neural networks.

The first is the 70 billion-parameter version of Meta’s Llama 2 model, which became available in July. Similarly to Claude, it’s a general-purpose large language model that can generate text and code. Meta trained Llama 2 on 2 trillion tokens, units of data that each contain a few characters or numbers.

As part of today’s update, Bedrock customers are also receiving access to Stable Diffusion XL 1.0, Stability AI’s latest image generation model. It can generate images with a resolution of 1024 by 1024 pixels in a variety of styles. Compared with Stability AI’s earlier models, Stable Diffusion XL 1.0 provides better contrast, lighting and shadows.

The new foundation models and the Model Evaluation on Amazon Bedrock tool are initially available in AWS’ Northern Virginia and Oregon cloud regions. 

Image: AWS

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU