August 14th, 2024

Prompt Caching with Claude

Anthropic has launched prompt caching for its Claude API, enhancing performance by reducing costs and latency significantly. Currently in beta for Claude 3.5 and 3 Haiku, with Opus support upcoming.

Read original articleLink Icon
Prompt Caching with Claude

Anthropic has introduced prompt caching for its Claude API, allowing developers to store frequently used context between API calls. This feature aims to enhance performance by reducing costs by up to 90% and latency by up to 85% for lengthy prompts. Currently in public beta for Claude 3.5 Sonnet and Claude 3 Haiku, with Claude 3 Opus support expected soon, prompt caching is particularly beneficial for applications such as conversational agents, coding assistants, large document processing, and detailed instruction sets. By caching context, users can significantly improve response times and reduce costs associated with multiple API calls. Early adopters have reported notable improvements in speed and cost efficiency across various use cases. The pricing structure for cached prompts includes a higher cost for writing to the cache but a significantly lower cost for reading from it. Notion is one of the companies leveraging this feature to enhance its AI assistant, Notion AI, resulting in a more efficient user experience. Developers interested in utilizing prompt caching can access the public beta through Anthropic's documentation and pricing page.

- Prompt caching reduces costs by up to 90% and latency by up to 85%.

- Currently available for Claude 3.5 Sonnet and Claude 3 Haiku, with Claude 3 Opus support coming soon.

- Effective for conversational agents, coding assistants, and large document processing.

- Pricing includes higher costs for cache writing and lower costs for cache reading.

- Notion is implementing prompt caching to improve its AI assistant's performance.

Link Icon 3 comments
By @waldrews - 3 months
I'm very happy this is now available. With a model as smart as Claude 3.5, this opens up the possibility of writing prompts the size of relatively complex programs, prompts with many examples and targeting multiple outputs, and running them repeatedly on different datasets (big code, small data), at a reasonable cost. For example, a classifier with a hundred class labels described by prompts? A questionnaire that turns a piece of unstructured data into a long list of features? A complex custom assistant, with a long prompt listing individual user's available tools and preferences?

(my previous rants on why we need this: https://news.ycombinator.com/item?id=40034972#40036309 )

Now we just need to merge Anthropic's cache pricing with Google's context length (their cache pricing isn't as good) and OpenAI's strict structured output mode...

By @thierrydamiba - 3 months
This is amazing! You could easily have your RAG app return the top 100 results now. You could probably do 1000 with no issues.

Google set the bar and it’s cool to see the other model companies following suit.

Building AI apps for production uses cases has never been easier, and the trend doesn’t appear to be slowing down…