August 14th, 2024

Prompt Caching with Claude

Anthropic has launched prompt caching for its Claude API, enhancing performance by reducing costs and latency significantly. Currently in beta for Claude 3.5 and 3 Haiku, with Opus support upcoming.

Read original article

Anthropic has introduced prompt caching for its Claude API, allowing developers to store frequently used context between API calls. This feature aims to enhance performance by reducing costs by up to 90% and latency by up to 85% for lengthy prompts. Currently in public beta for Claude 3.5 Sonnet and Claude 3 Haiku, with Claude 3 Opus support expected soon, prompt caching is particularly beneficial for applications such as conversational agents, coding assistants, large document processing, and detailed instruction sets. By caching context, users can significantly improve response times and reduce costs associated with multiple API calls. Early adopters have reported notable improvements in speed and cost efficiency across various use cases. The pricing structure for cached prompts includes a higher cost for writing to the cache but a significantly lower cost for reading from it. Notion is one of the companies leveraging this feature to enhance its AI assistant, Notion AI, resulting in a more efficient user experience. Developers interested in utilizing prompt caching can access the public beta through Anthropic's documentation and pricing page.

- Prompt caching reduces costs by up to 90% and latency by up to 85%.

- Currently available for Claude 3.5 Sonnet and Claude 3 Haiku, with Claude 3 Opus support coming soon.

- Effective for conversational agents, coding assistants, and large document processing.

- Pricing includes higher costs for cache writing and lower costs for cache reading.

- Notion is implementing prompt caching to improve its AI assistant's performance.

Claude 3.5 Sonnet

Claude 3.5 Sonnet, the latest in the model family, excels in customer support, coding, and humor comprehension. It introduces Artifacts on Claude.ai for real-time interactions, prioritizing safety and privacy. Future plans include Claude 3.5 Haiku and Opus, emphasizing user feedback for continuous improvement.

Improving Tiptap's Performance for Anthropic's Claude Interface

Tiptap's Philip reported performance issues with Anthropic's claude.ai using their editor. Tiptap released version 2.5 to improve performance by reducing unnecessary re-renders and optimizing content conversion.

3 comments

By @waldrews - 9 months

I'm very happy this is now available. With a model as smart as Claude 3.5, this opens up the possibility of writing prompts the size of relatively complex programs, prompts with many examples and targeting multiple outputs, and running them repeatedly on different datasets (big code, small data), at a reasonable cost. For example, a classifier with a hundred class labels described by prompts? A questionnaire that turns a piece of unstructured data into a long list of features? A complex custom assistant, with a long prompt listing individual user's available tools and preferences?

(my previous rants on why we need this: https://news.ycombinator.com/item?id=40034972#40036309 )

Now we just need to merge Anthropic's cache pricing with Google's context length (their cache pricing isn't as good) and OpenAI's strict structured output mode...

By @thierrydamiba - 9 months

This is amazing! You could easily have your RAG app return the top 100 results now. You could probably do 1000 with no issues.

Google set the bar and it’s cool to see the other model companies following suit.

Building AI apps for production uses cases has never been easier, and the trend doesn’t appear to be slowing down…

Prompt Caching with Claude

Related

Claude 3.5 Sonnet

Improving Tiptap's Performance for Anthropic's Claude Interface

Related

Claude 3.5 Sonnet

Improving Tiptap's Performance for Anthropic's Claude Interface