June 20th, 2024

Claude 3.5 Sonnet

Claude 3.5 Sonnet, the latest in the model family, excels in customer support, coding, and humor comprehension. It introduces Artifacts on Claude.ai for real-time interactions, prioritizing safety and privacy. Future plans include Claude 3.5 Haiku and Opus, emphasizing user feedback for continuous improvement.

Read original articleLink Icon
Claude 3.5 Sonnet

Claude 3.5 Sonnet has been launched as the latest addition to the Claude 3.5 model family, offering enhanced intelligence and performance surpassing previous models. It excels in tasks like customer support and coding, operating at double the speed of Claude 3 Opus. The model showcases improved reasoning abilities, humor comprehension, and coding proficiency. Claude 3.5 Sonnet also introduces Artifacts on Claude.ai, enabling users to interact with AI-generated content in real-time. Safety and privacy remain a top priority, with rigorous testing and engagement with external experts to ensure responsible use. Future plans include releasing Claude 3.5 Haiku and Claude 3.5 Opus, along with developing new features like Memory for personalized interactions. The company emphasizes user feedback to enhance user experience continually. Claude.ai aims to evolve into a collaborative workspace supporting team collaboration and secure knowledge sharing.

Related

Optimizing AI Inference at Character.ai

Optimizing AI Inference at Character.ai

Character.AI optimizes AI inference for LLMs, handling 20,000+ queries/sec globally. Innovations like Multi-Query Attention and int8 quantization reduced serving costs by 33x since late 2022, aiming to enhance AI capabilities worldwide.

LibreChat: Enhanced ChatGPT clone for self-hosting

LibreChat: Enhanced ChatGPT clone for self-hosting

LibreChat introduces a new Resources Hub, featuring a customizable AI chat platform supporting various providers and services. It aims to streamline AI interactions, offering documentation, blogs, and demos for users.

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

The video discusses limitations of large language models in AI, emphasizing genuine understanding and problem-solving skills. A prize incentivizes AI systems showcasing these abilities. Adaptability and knowledge acquisition are highlighted as crucial for true intelligence.

Show HN: Field report with Claude 3.5 – Making a screen time goal tracker

Show HN: Field report with Claude 3.5 – Making a screen time goal tracker

The author shares their positive experience using Claude 3.5 Sonnet to track screen time goals. Claude proved reliable, fast, and auditable, aiding in reducing screen time through visualizations and goal setting. Despite design flaws, Claude improved performance with accurate metrics and visualizations, benefiting the author's screen time tracking.

Synthesizer for Thought

Synthesizer for Thought

The article delves into synthesizers evolving as tools for music creation through mathematical understanding of sound, enabling new genres. It explores interfaces for music interaction and proposes innovative language models for text analysis and concept representation, aiming to enhance creative processes.

Link Icon 67 comments
By @netsec_burn - 7 months
Opus remained better than GPT for me, even after the release of GPT-4o. VERY happy to see an even further improvement beyond that, Claude is a terrific product and given the news that GPT-5 only began its training several weeks ago I don't see any situation where Anthropic is dethroned in the near term. There are only two parts of Anthropic's offering I'm not a fan of:

- Lack of conversation sharing: I had a conversation with Claude where I asked it to reverse engineer some assembly code and it did it perfectly on the first try. I was stunned, GPT had failed for days. I wanted to share the conversation with others but there's no way provided like GPT, and no way to even print the conversation because it cuts off on the browser (tested on Firefox).

- No Android app. They're working on this but for now, there's only an iOS app. No expected ETA shared, I've been on the waitlist.

I feel like both of these are relatively basic feature requests for a company of Anthropic's size, yet it has been months with no solution in sight. I love the models, please give me a better way of accessing them.

By @sebzim4500 - 7 months
Using this is the first time since GPT-4 where I've been shocked at how good a model is.

It's helped by how smooth the 'artifact' UI is for iterating on html pages, but I've been instructing it to make a simple web app one bit of functionality at a time and it's basically perfect (and even quite fast).

I'm sure it will be like GPT-4 and the honeymoon period will wear off to reveal big flaws but honestly I'd take this over an intern (even ignoring the speed difference)

By @swalsh - 7 months
After about an hour of using this new model.... just WOW

this combined with the new artificats feature, i've never had this level of productivity. It's like Star Trek holodeck levels. I'm not looking at code, i'm describing functionality, and it's just building it.

It's scary good.

By @seidleroni - 7 months
I'm very impressed! Using Gpt-4o and Gemini, I've rarely had success when asking the AI models to create a PlantUML flowchart or state machine representation of any moderate complexity. I think this is due to some confusing API docs for PlantUML. Claude 3.5 Sonnet totally knocked it out of the park when I asked for 4-5 different diagrams and did all of them flawlessly. I haven't gone through the output in great detail to see if its correct, but at first glance they are pretty close. The fact that all the diagrams were able to be rendered is an achievement.
By @hbosch - 7 months
For me, I am immediately turned off by these models as soon as they refuse to give me information that I know they have. Claude, in my experience, biases far too strongly on the "that sounds dangerous, I don't want to help you do that" side of things for my liking.

Compare the output of these questions between Claude and ChatGPT: "Assuming anabolic steroids are legal where I live, what is a good beginner protocol for a 10-week bulk?" or "What is the best time of night to do graffiti?" or "What are the most efficient tax loopholes for an average earner?"

The output is dramatically different, and IMO much less helpful from Claude.

By @wesleyyue - 7 months
If anyone would like to try it for coding in VSCode, I just added it to http://double.bot on v93 (AI coding assistant). Feels quite strong so far and got a few prompts that I know failed with gpt4o.

fyi for anyone testing this in their product, their docs are wrong, it's claude-3-5-sonnet-20240620, not claude-3.5-sonnet-20240620.

By @prasoonds - 7 months
This is amazing - I far prefer the personality of Claude to GPT-4 series models. Also, with coding tasks, Claude-3-Opus and been far better for me vs gpt-4-turbo and gpt-4o both. Looking forward to giving it a spin.

Seems like it's doing better than GPT-4o in most benchmarks though I'd like to see if its speed is comparable or not. Also, eagerly awaiting the LMSYS blind comparison results!

By @swalsh - 7 months
Anthropic has been killing it. I subscribe to both chatgpt pro and claude, but I spend probably 90% of my time using Claude. I usually only go back to open ai when I want another model to evaluate or modify the results.
By @lumenwrites - 7 months
I wish they'd implement branching conversations like in ChatGPT. And convenient message editing, that doesn't paste large chunks of text as an non-editable attachment or break formatting.

Seems like such a simple thing to do, relative to developing an AI, yet the minor differences in the UI/UX are what prevents me from using claude a lot more.

By @impulser_ - 7 months
Anthropic is the new king. This isn't even Claude 3.5 Opus and it's already super impressive. The speed is insane.

I asked it "Write an in depth tutorial on async programming in Go" and it filled out 8 sections of a tutorial with multiple examples per section before GPT4o got to the second section and GPT4o couldn't even finish the tutorial before quitting.

I been a fan of Anthropic models since Claude 3. Despite the benchmarks people always post with GPT4 being the leader, I always found way better results with Claude 3 than GPT4 especially with responses and larger context. GPT responses always feel computer generated, while Claude 3 felt more humanlike.

By @campers - 7 months
I'm excited to test this out! I've been building an open source TypeScript agentic AI platform for work (DevOps related with an autonomous agent and software engineer workflow). The Claude 3 models had an influence on the design with their tuning on using XML and three levels of capabilities, and have been my preferred models to use.

I remember having moments looking at the plans Opus generated and being impressed with its capabilities.

The slow speed of requests I could deal with, but the costs could quickly add up in workflows and the autonomous agent control loop. When GPT4o came out at half the price it made Opus quite pricey in comparison. I'd often thought if I could just have Opus capabilities at a fraction of the price, so its a nice surprise to have it here sooner that I imagined!

The agent platform isn't officially launched yet, but its up at https://github.com/trafficguard/nous

I never liked the Langchain API when I looked at the examples so I built it from scratch. It has an autonomous agent with a custom XML-based function calling, memory and call history. The software engineer agentic workflow I initially dog-fooded with a prompt like "Complete Jira XYZ-123". So it gets the Jira description, finds the appropriate terraform project in GitLab, clones, edits (delegating to Aider), creates a MR and messages on Slack. It also has a UI for running agents, human-in-the-loop interactions etc.

By @zurfer - 7 months
This is impressive. I've just ran a couple of gpt4o workloads for getdot.ai on Sonnet and the quality is great.

Plus it's super fast right now ~110 token/s (but gpt4o was also super fast when they launched). But what will stay is the 40% price drop for input tokens. I love it. :)

By @aden1ne - 7 months
Just tried it. This is the first model that immediately gives me the correct answer to my test prompt: "Hi <model>, can you give me an exact solution to pi in python?". All other models I've tried first give an approximation, taking several prompts to come to the correct conclusion: it's impossible.
By @wayeq - 7 months
>does the period of a point mass moving in a potential well depend on its mass?

"For a simple harmonic oscillator, the period does not depend on the mass of the object. However, for a general potential well, the answer is more nuanced:

Simple harmonic oscillator: In this case, the period T is given by: T = 2π√(m/k) where m is the mass and k is the spring constant. However, the frequency of oscillation ω = √(k/m), so T = 2π/ω. The spring constant k is typically proportional to m, making the period independent of mass.

General potential well: For a non-harmonic potential, the period usually does depend on mass. The period is given by: T = 2∫dx/v(x)

where v(x) is the velocity as a function of position. This velocity depends on the total energy and the potential energy:

v(x) = √(2/m * (E - V(x)))

Here, the mass appears explicitly, and generally doesn't cancel out as it does in the simple harmonic case.

Would you like me to elaborate on any part of this explanation?"

>why do you say the spring constant k is typically proportional to m? a spring's force when stretched to a given length doesn't depend on what's attached to the end

"You're absolutely right, and I apologize for that error in my explanation. You are correct that the spring constant k is a property of the spring itself and does not depend on the mass attached to it."

By @lfmunoz4 - 7 months
Which is the goto leaderboard for determining which AI model is best for for answering devops / computer science questions / generating code? Wondering where Claude falls on this.

Recently canceled openai subscription because too much lag and crashes. Switched to Gemini because their webinterface is faster and rock solid. Makes me think the openai backend and frontend engineers don't know what they are doing compared to the google engineers.

By @meetpateltech - 7 months
> Artifacts—a new way to use Claude You can ask Claude to generate docs, code, mermaid diagrams, vector graphics, or even simple games.

this is new and I just tried a simple dice roll into a React component, and it works perfectly.

By @freediver - 7 months
On a first glance, CS3.5 appears to be slightly faster than gpt-4o (62 vs 49 tok/sec) and slightlhy less capable (78% vs 89% accuracy on our internal reasoning benchmark). When initially launched, gpt-4o had speed of over 100 tok/sec, surprised that speed went down as fast.
By @atlex2 - 7 months
Might look small, but the needle in a haystack numbers they report in the model card addenda at 200k are also a massive improvement towards “Proving a negative”… I.e. your answer does not exist in your text. %99.7 vs 98.3 for Opus https://cdn.sanity.io/files/4zrzovbb/website/fed9cc193a14b84...
By @freediver - 7 months
For Anthropic devs out there: Please consider adopting a naming convention that will automatically upgrade API users to the latest version when available. Eg. there should be just 'claude-sonnet'.
By @NorwegianDude - 7 months
For anyone paying and using Claude Pro, what are the message limits like compared to ChatGPT Plus?

I tried one question that gave me a good answer, and then I tried two of the sample questions. After a couple more short input/outputs I reaced the max messge limit. They state that Pro is at least 5x, but if that is only ~25 short token input/outputs then that's too little for it to not be really annoying to use. What is usage actually like? Is free just very limited now compared to pro? Or is pro also very limited?

By @WiSaGaN - 7 months
This is impressive. I just tested a debug problem I encountered in real life, which previous models including got-4o can’t do. Claude 3 sonnet nailed it in first try.
By @danilop - 7 months
Here's more info on how to use it on AWS with Amazon Bedrock:

https://aws.amazon.com/blogs/aws/anthropics-claude-3-5-sonne...

By @brokensegue - 7 months
Can anyone find pricing details?

Ah it is "The model costs $3 per million input tokens and $15 per million output tokens"

By @omnee - 7 months
I've asked models from ChatGPT3.5 to many others including the latest ones to calculate the calories expended when running, and am still receiving mixed results. In this instance, Claude 3.5 Sonnet got it right and ChatGPT 4o was wrong.

Q: Calculate the energy in calories used by a person aged 30, weighing 80kg, of averge fitness, and running at 8 km/h for 10km

Claude 3.5 Sonnet: Here's the step-by-step calculation:

MET value for running at 8 km/h (5 mph): Approximately 8.3 METs Formula for calorie burn: Calories = MET value × Weight in kg × Time in hours Calculate time in hours: Distance = 10 km Speed = 8 km/h Time = Distance / Speed = 10 km / 8 km/h = 1.25 hours Apply the formula: Calories = 8.3 × 80 kg × 1.25 hours Calories = 830 calories

GPT-4o: To calculate the energy expenditure for a person running, we can use the MET (Metabolic Equivalent of Task) value for running. MET is a unit used to estimate the amount of energy expended during physical activities. The MET value for running at 8 km/h (which is equivalent to 5 mph) is typically around 8.3 METs.

Calories burned per minute= 8.3 × 200/80 = 3.32 calories per minute

The total time in minutes is: 1.25 hours × 60 minutes/hour = 75 minutes 1.25 hours×60 minutes/hour=75 minutes

Total calories burned=3.32 calories per minute×75 minutes = 249 calories

By @eigenvalue - 7 months
Awesome, can’t wait to try this. I wish the big AI labs would make more frequent model improvements, like on a monthly cadence, as they continue to train and improve stuff. Also seems like a good way to do A/B testing to see which models people prefer in practice.
By @cube2222 - 7 months
Alright, the inline js rendering is really cool. Just asked it for a react component and it all rendered inline in the web ui!

And it's free!

By @pantsforbirds - 7 months
This is a very strong offering. I've been really impressed with 3.0 Haiku for smaller tasks, but I'm going to have to test 3.5 Sonnet as our primary pipeline model.
By @WhitneyLand - 7 months
I see plenty of praise here for Claude over GPT4.o in writing code but it failed for me.

To add to the collective anecdata, here Gpt 4.o does fine and Claude invents packages that don’t exist:

Question:

“Write code in Swift to use RAG and LLMs to allow users to ask questions about a document collection.

Let’s use services to get the app completed quickly.

What do you think about using Cohere for the text embedding model, and Pinecone for the vector db?”

Output (packages don’t exist):

import Cohere

import PineconeSwift

class RAGService {

private let cohere: CohereAPI

private let pinecone: Pinecone

By @heymijo - 7 months
Anyone want to make a case for Anthropic being undervalued?

$7.5 billion raised at an $18.4 billion valuation for Anthropic.

$11.3 billion raised at an $80.0 billion valuation for OpenAI.

By @anais9 - 7 months
This is awesome! Until GPT-4o dropped, Claude 3 Opus was hands down my go-to for code generation.

Between these model performance improvements and their new "artifacts" handling, I get the impression this update may sway me strongly back towards Anthropic (at least for this use case).

By @Lockal - 7 months
I wonder why it is missing in https://arena.lmsys.org/ and what ELO of this model could be?

Upd: they added it, I guess there are not enough results yet

By @beaugunderson - 7 months
Even with this new model, at the bottom of the page I see something like "Claude can't yet run the code it generates." But if I ask it to run the code it has generated for me, it confidently generates output that looks like the output of that code should look, with the wrong numbers (off by a factor of about a million in the case of my toy question). When I tell it it's off by a factor of a million it regenerates the output, and is wrong again.
By @ModernMech - 7 months
It does better on some of my tests but not enough for me to feel confident it's "solving problems". One thing I like to do is pose a problem and give it a bunch of objects it could use, some more helpful than others. I have found language models fail at discerning which of the tools are useful and which are red herrings, opting to use everything in some way.

My go to test is the boat and goat: "You are on the shore of a river with a boat and a stoat. How do you get to the other side?"

Previous LLMs have pattern matched this example to the logic puzzle, and come up with a complicated scenario about going back and forth with the goat.

Claude 3.5 says to just bring the goat across in the boat, which is wrong but better than previous versions. So that's an improvement.

But when you add more objects in, Claude just gets weird with it. When I told Claude it had a "boat, a stoat, and a rope, a shovel, and a handkerchief" it decided it had to use all the things in the strangest way and advised me to drown a rodent:

  1. Use the shovel to dig a small trench near the shore.
  2. Fill the boat with water and sink it in the trench.
  3. Place the stoat in the boat and cover it with the handkerchief to keep it calm.
  4. Use the rope to pull the sunken boat across the river.
That's just a worrying degree of absent logic. Nothing about that makes sense. It does circle back to say "This method keeps you dry and provides a unique solution. However, a simpler approach would be to just use the boat normally and keep the other items as potentially useful tools."

And that really summarizes my whole problem with LLMs -- if I were using this in a scenario that wasn't so obvious, say programming, I would have no idea steps 1-4 were nonsense. If the LLM doesn't know what's nonsense, and I don't know, then it's just the blind leading the blind.

By @nuancebydefault - 7 months
Wow claude just works in Belgium as well now! Last time i tried it was not the case.

I tried some questions/conversation about .bat files and UNC paths and gave solutions and was able to explain them with much detail, without looking up anything on the web.

When asking for URLs, it explained those are not inside the model and gave good hints on how to search the web for it (Microsoft dev network etc).

Impressed!

By @poethetortoise - 7 months
Some interesting questions Claude cracks:

Let W=Q+R where Q and R are standard normal. What is E[Q|W]?

Perplexity failed and said W. Both ChatGPT and Claude correctly said W/2.

Let X(T) be a gaussian process with variance sigma^2 and mean 0. What is E[(e^(X(T)))^2]?

ChatGPT and Claude both correctly said E[(e^(X(T)))²] = e^(2σ²)

I think Claudes solution was better.

I then tried it on a probbailtiy question which wasn't well known and it failed miserably.

By @javier_e06 - 7 months
This is impressive. I spend some time polishing my questions though. Poor questioning produces verbose replies.

There is no need to nail it on the first reply. Unless is pretty obvious.

If I ask how to install Firefox in Linux it can reply with: "Is this for Ubuntu? What distro are talking about?"

This is more human like. More natural. IMO.

By @vatsadev - 7 months
I wonder if they used new mech interp techniques on this
By @Sharlin - 7 months
A while ago I tested the image recognition skills of GPT-4o, Claude 3, and Gemini using a random street plan diagram I happened to have in my downloads folder (https://i.imgur.com/9WZpK0L.png). It's a top-down CAD rendering showing the planned renovation of a street in my neighborhood in Tampere, Finland. I uploaded the image and simply asked each model "What can you tell me about the attached picture?"

GPT-4o's answer was excellent and highly detailed, recognizing essentially all the relevant aspects of the image [GPT4o]. Claude 3 Sonnet was correct on a general level, but its answer was much less detailed and showed more uncertainty in the form of "A or B" sentences [CL3]. Gemini's answer was, well, hilariously wrong [GEM].

I just tried this with Claude 3.5 Sonnet and it did very well. Its answer was still not as detailed as GPT-4o's, but it did ask me if I want it to elaborate on any aspect of the image [CL35].

I think this was an interesting experiment because street plan CAD diagrams probably aren't very common in the training data of these models.

--

[GPT4o] https://gist.github.com/jdahlstrom/844bda8ac76a5c3248c863d20...

[CL3] https://gist.github.com/jdahlstrom/ecccf31c8305f82519f27af53...

[GEM] https://gist.github.com/jdahlstrom/2e12a966c0d603a7b1446ba08...

[CL35] https://gist.github.com/jdahlstrom/60ca9352630934bec6e2f4e37...

By @dmd - 7 months
This is fantastic. I know it's bells-and-whistles and not the core offering, but a thing that keeps me on the $20/month chatgpt (as opposed to using the API) for my use case (mostly interactive data exploration) is its ability to actually run the code it generates. The ergonomics of chatgpt's ability to answer a question by writing and then executing some python and then even displaying a graph are really important to me.
By @zone411 - 7 months
Slightly better on the NYT Connections benchmark (27.9) than Claude 3 Opus (27.3) but massively improved over Claude 3 Sonnet (7.8).

GPT-4o 30.7

Claude 3.5 Sonnet 27.9

Claude 3 Opus 27.3

Llama 3 Instruct 70B 24.0

Gemini Pro 1.5 0514 22.3

Mistral Large 17.7

Qwen 2 Instruct 72B 15.6

By @replwoacause - 7 months
I have been so impressed with Claude. I routinely turn to it over ChatGPT. Has anyone else felt like 4o was a downgrade from 4? Anyway…this is exciting
By @lebed2045 - 7 months
I wish Claude app has voice input like chatGPT, for someone who's bad at typing like me - it's essential part that is missing
By @throwup238 - 7 months
Woah, this is a marked improvement. I just threw a relative complex coding problem at it and 3.5 sonnet did a really good job across several language. I asked it to rewrite a Qt6 QSyntaxHighlighter subclass to use TreeSitter to support arbitrary languages and not only did it work (with a hardcoded language) but it even got the cxx-qt Rust bindings almost right, including the extra header.

Curious to see how well it handles QML because previous models have been absolutely garbage at it.

By @n4r9 - 7 months
This is the first LLM to really impress me. Asking for some tweaks and refinements to adapt k-means to constrained cluster sizes produced some plausible algorithms and it was able to give a plausible time-complexity estimate.

I eventually tripped it up by asking a maths question that a good secondary school student can answer:

------------------------------------------

Q: Let us say that a positive integer p is a "prime-giver" if the numbers p, p+2 and p+4 are all prime. Can you prove whether or not there are infinitely many prime-givers?

A: This is an interesting number theory problem that touches on some deep unsolved questions in mathematics. Let's approach this step-by-step:

First, let's consider what we know:

For a number p to be a prime-giver, p, p+2, and p+4 must all be prime. This is a special case of prime triplets, which are three primes in arithmetic progression with a common difference of 2.

Some observations:

The smallest prime-giver is 3, as 3, 5, and 7 are all prime. The next prime-giver is 5, as 5, 7, and 11 are all prime. After this, all prime-givers must be odd numbers (as one of p, p+2, p+4 would be even if p were even).

Now, for the main question: Are there infinitely many prime-givers? This question is actually related to some famous unsolved problems in number theory:

It's a stronger version of the question: "Are there infinitely many prime triplets?" Which in turn is related to the more general Twin Prime Conjecture.

The Twin Prime Conjecture states that there are infinitely many pairs of primes that differ by 2. This is still an open problem in mathematics, despite significant progress in recent years. If the Twin Prime Conjecture is true, it doesn't necessarily imply that there are infinitely many prime-givers. However, if there are finitely many prime-givers, it would imply some very interesting and unexpected behavior of the primes. Given the current state of number theory, we cannot definitively prove or disprove whether there are infinitely many prime-givers.

Conclusion:

We cannot prove whether or not there are infinitely many prime-givers. This problem is likely at least as hard as the Twin Prime Conjecture, which remains one of the most famous unsolved problems in mathematics. What we can say is that if there are infinitely many prime-givers, they become increasingly rare as numbers get larger, due to the increasing scarcity of primes. However, their exact distribution and whether they are infinite in number remains an open question in number theory.

------------------------------------------

I think it got confused between my question and the study of "prime triplets" in which the first and last number differ by 6 (rather than 4).

------------------------------------------

Q: You seem to have confused prime-givers with prime triplets. In the second bullet point of your point 2, you claim that 5 is a prime-giver as 5, 7 and 11 are all prime. However, in my definition of prime-giver, it would need to be the case that 5, 7 and 9 are all prime, which is not true. Can you reconsider your answer in light of this?

A: I apologize for that confusion. You're absolutely right, and I thank you for pointing this out. Let me reconsider the problem with the correct definition of prime-givers. Let's start over:

Definition: A positive integer p is a "prime-giver" if p, p+2, and p+4 are all prime. Observations:

The smallest prime-giver is 3, as 3, 5, and 7 are all prime. All prime-givers must be odd numbers (as one of p, p+2, p+4 would be even if p were even). Any prime-giver p > 3 must be of the form 6k+5 for some integer k, because:

If p = 6k+1, then p+4 = 6k+5 is divisible by 5 If p = 6k+3, then p+2 = 6k+5 is divisible by 5

Now, for the main question: Are there infinitely many prime-givers? This question is actually easier to resolve than I initially thought. We can prove that there are finitely many prime-givers:

Proof:

Consider any prime-giver p > 3. We know that p, p+2, and p+4 are all prime. Now, p+4 ≡ 0 (mod 3) because:

p ≡ 2 (mod 3) [since p is prime and not 3, it must give remainder 2 when divided by 3] So, p+4 ≡ 2+4 ≡ 0 (mod 3)

But p+4 is supposed to be prime (by the definition of prime-giver). The only number that is both divisible by 3 and prime is 3 itself. So, p+4 = 3, which means p = -1. But p is supposed to be a positive integer.

Therefore, the only prime-giver is 3. Conclusion: There is exactly one prime-giver, which is 3. The question of whether there are infinitely many is resolved: there are not. Thank you again for catching my earlier mistake. This problem has a much more definitive answer than I initially thought, and it's an excellent example of how a small change in definition can dramatically alter the nature of a mathematical question.

---------------------------------------------------

This still isn't quite right, since p could have remainder 1 mod 3 (or at least, Claude doesn't consider this in the proof). I tried to get it to account for this but it then got very confused and claimed once again that 5 is a prime-giver.

By @sshroot - 7 months
What is the training cutoff for 3.5 Sonnet?
By @nsingh2 - 7 months
I like the artifact UI, not having big outputs like that inline with the dialog really helps with readability.
By @z7 - 7 months
Does it have some kind of world model or not? (Cf. https://news.ycombinator.com/item?id=40719443)

Prompt:

Joe drove Sue to university. Afterwards he drove home again and drank a tea with her in the kitchen.

**

There seems to be some inconsistency in the scenario you've described. Let me break it down:

1. Joe drove Sue to university.

2. Afterwards, Joe drove home.

3. Joe drank tea with Sue in the kitchen.

The inconsistency arises in the third point. If Joe drove Sue to university and then drove home alone, Sue wouldn't be at home to drink tea with Joe in the kitchen.

There are a few possible explanations:

1. There might be a mistake in the sequence of events.

2. Perhaps Sue returned home by other means before Joe had his tea.

3. Maybe "her" in the last sentence refers to someone else, not Sue.

Could you clarify the intended sequence of events or if there's additional information that might explain this situation?

**

"If Joe drove Sue to university and then drove home alone, Sue wouldn't be at home to drink tea with Joe in the kitchen." Why not?

**

I apologize for my mistake. You're absolutely right to question my assumption. There's actually no contradiction in the original scenario, and I made an incorrect inference. Let me explain:

1. Joe drove Sue to university.

2. Joe drove home.

3. Joe drank tea with Sue in the kitchen.

I incorrectly assumed that these events happened immediately one after another, and that Sue stayed at the university. However, there's nothing in the original statement that indicates this time frame or that Sue remained at the university.

Several perfectly reasonable scenarios could explain this sequence:

1. Sue's time at the university was brief (perhaps dropping something off or a short meeting), and she returned home before or around the same time as Joe.

2. The tea drinking happened later, perhaps in the evening after Sue returned from university.

3. Joe might have picked Sue up from the university before going home, even though this wasn't explicitly stated.

Thank you for pointing out this flaw in my reasoning. It's a good reminder to avoid making assumptions not supported by the given information. The scenario as presented is entirely plausible without any contradictions.

**

Why can't Sue be both at the university and at home at the same time?

**

I apologize for my oversight. You're absolutely correct to challenge this assumption as well. The reason Sue can't be both at the university and at home at the same time is due to the principle of non-bilocation, which is a fundamental aspect of our physical reality. Let me explain:

1. Non-bilocation: In our everyday physical world, a single person cannot occupy two distinct locations simultaneously. This is a basic principle of classical physics and our common understanding of space and time.

2. Mutually exclusive locations: The university and home are understood to be separate, distinct locations. Being in one location precludes being in another at the exact same moment.

3. Continuity of existence: We generally assume that people exist continuously in one location at a time, moving between locations over time.

However, it's worth noting that there are some scenarios where this principle might seem to be violated:

1. Quantum mechanics: At the quantum level, particles can exhibit properties that seem to defy this principle, but this doesn't apply to macroscopic objects like people.

2. Metaphorical or spiritual interpretations: In some philosophical or religious contexts, one might speak of a person being in two places at once in a non-literal sense.

3. Technology: Video calls or virtual presence might give the impression of being in two places, but physically the person is still in one location.

Thank you for this question. It's an excellent reminder to examine even our most basic assumptions about reality.

By @bcherny - 7 months
Awesome!!

Seems like the API docs haven't been updated yet: https://docs.anthropic.com/en/docs/about-claude/models

By @42lux - 7 months
I know it's nitpicking but please anthropic give non business accounts API access.
By @xs83 - 7 months
How is its laziness? I found Opus to be very quick to curtail output and default into "Add the rest of your code here" type things.

I am a lazy data engineer - I want to prompt it into something I can basically copy and paste

By @autokad - 7 months
from what I can tell, on Bedrock its only available in us-east-1 and upon special request. the modelId is 'anthropic.claude-3-5-sonnet-20240620-v1:0'
By @imjonse - 7 months
Opus was taken over by quite a few Gemini and GPT4 models on the chat arena leaderboard, hopefully this entry will put Anthropic back near the top. Nice work!
By @fetzu - 7 months
Finally available in Switzerland, thank you very much !
By @m_mueller - 7 months
Doesn't look to be available on Bedrock yet. Maybe tomorrow, since the article says June 21st also? We truly live in the future...
By @TiredOfLife - 7 months
Unfortunately still thinks "There are two 'r's in the word "raspberry"."

The only one that got it right was the basic version of Gemini "There are actually three "r"s in the word "strawberry". It's a bit tricky because the double "r" sounds like one sound, but there are still two separate letters 'r' next to each other."

The paid Gemini advanced had "There are two Rs in the word "strawberry"."

By @theusus - 7 months
My biggest gripe with Claude is how easily it hits the rate limit and falls down to lower quality model.
By @aliasxneo - 7 months
Does anyone have tips on how to go about using it for refactoring an existing (larger) codebase?
By @m3kw9 - 7 months
So far it isn’t doing better or worse than gpt4o, if you want someone to switch, it better be way better or way better price. The price is exactly what OpenAI is charging to the cent. So no, you won’t get someone to switch because the differentiator is just the UI
By @bufferoverflow - 7 months
No ARC-AGI benchmarks?
By @xfalcox - 7 months
They said it is available on Bedrock, but it isn't :(
By @andhuman - 7 months
Hey cool, it's available in Sweden!
By @greatpostman - 7 months
OpenAI must be cooking something huge for them to not be releasing products far ahead of competitors
By @sashank_1509 - 7 months
Why does it still lose to me at a simple game of tic tac toe, smh
By @m3kw9 - 7 months
I gave it a fairly simple coding questions and it failed pretty severely, to be fair ChatGPT 4o also failed that. Just saying it ain’t all that given the hype
By @Malidir - 7 months
Has a limit, after 5 ish questions I got "You are out of free messages until 10 PM" which I have never had on chatgpt
By @qmarkgn - 7 months
The trial requires a signup with an email address. No thanks! This is one thing Microsoft got right with CoPilot.

With so much competition, I wonder why everyone else makes it hard to try out something.