UPDATED 12:00 EDT / AUGUST 14 2024

AI

Anthropic speeds up its AI model access times with prompt caching

Artificial intelligence startup Anthropic PBC, the creator of the generative AI chatbot Claude, today announced the launch of prompt caching, a new feature that improves the response times of AI its large language models by allowing developers to pass them longer, more detailed prompts.

Traditionally, AI engineers must build complex “prompts” or natural language data blocks for AI to work on to generate responses each time they run. A prompt could be something as simple as “What is today’s weather like?” or something as long and complex as an entire document.

When a user wants an LLM to answer questions about a large document, for example, that document needs to be part of the prompt for every subsequent conversation. That means it needs to be reloaded in its entirety into the AI each time, which can eat up a lot of resources as it’s slowly ingested once again between conversations.

Cached prompts can also allow developers to store detailed instructions, example responses and relevant information. This permits them to easily set up a way to produce a consistent response between separate instances of the chatbot without the need to inject them on top of the user prompt every time. Anthropic said prompt caching is most effective when sending a large amount of prompt context once and then referring to that information repeatedly in new requests.

That’s because prompts are made up of tokens based on the number of words the LLM must process, and adding more every time slows down response time. According to Anthropic, using prompt caching, can reduce overall costs for businesses and developers by up to 90% and improve response times by up to two times.

Use cases for this type of caching capability include large document processing, as mentioned above, by allowing business users to incorporate the same long-form material into multiple conversations without needing to reload it. As a result, latency and costs aren’t increased. Detailed instruction sets can be set ahead of time and shared across every conversation in one place that can fine-tune Claude’s responses without incurring repeated costs.

Prompt caching also has a powerful use case for enhancing the performance of AI agents, where the LLM needs to make multiple calls to third-party tools, execute iterative code changes and step through complex instructions.

The feature is rolling out in beta mode today on the Anthropic application programming interface for Claude 3.5 Sonnet, the company’s most powerful multimodal LLM model, and the high-speed model Claude 3 Haiku.

Image: Anthropic

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU