August 14th, 2024

Grok-2 Beta Release

Grok-2 and Grok-2 mini have been released in beta on the đť•Ź platform, outperforming other models in benchmarks and enhancing user experience with real-time information and improved interaction capabilities.

Read original articleLink Icon
Grok-2 Beta Release

Grok-2 has been released as a beta version on the đť•Ź platform, showcasing significant advancements in language processing and reasoning capabilities compared to its predecessor, Grok-1.5. The release includes two models: Grok-2 and Grok-2 mini, both of which have demonstrated superior performance on the LMSYS leaderboard, outperforming notable models like Claude 3.5 Sonnet and GPT-4-Turbo. Grok-2 has been evaluated across various academic benchmarks, showing improvements in reasoning, reading comprehension, math, science, and coding. It excels in visual tasks and document-based question answering. The models are designed to enhance user experience on the đť•Ź platform, offering real-time information and improved interaction capabilities. Premium users can access these models through a redesigned interface, while developers will have access to Grok-2 via a new enterprise API, which includes enhanced security features and management tools. The rollout aims to integrate Grok's capabilities into various AI-driven features on the platform, with future updates expected to enhance multimodal understanding.

- Grok-2 and Grok-2 mini are now in beta on the đť•Ź platform.

- Grok-2 outperforms other leading models in various benchmarks.

- The models enhance user experience with real-time information and improved interaction.

- Developers can access Grok-2 through a new enterprise API with advanced security features.

- Future updates will focus on integrating multimodal understanding into the Grok experience.

Link Icon 29 comments
By @espadrine - 5 months
The technology is impressive; achieving such a level requires a lot of efforts in dataset creation, neural architectural costs, and GPU shepherding.

What is the company’s ethical position though? It officially stemmed from Mr Musk’s objection that OpenAI was not open-source, but it too is not open-source. It followed Mr Musk’s letter to stop all AI development on frontier models, but it is a frontier model. It followed complaints that OpenAI trained on tweets, but it also trained on tweets.

Companies like Meta, Mistral, or DeepSeek, address those complaints better, and all now play in the big league.

By @tgsovlerkhgsel - 5 months
I assume they will have a lot less "safety", i.e. the model will be more likely to actually do what you ask instead of finding a reason why "sorry Dave, I can't do that".

Since these "safety" features tend to also degrade the model, that's likely also helping them catch up in the benchmarks.

By @leroman - 5 months
It's hilarious they put Claude 3.5 Sonnet in the far right corner while it scores the highest and beats most of Grok's numbers.
By @pikseladam - 5 months
It uses FLUX.1 to generate images and it has been fun so far. Its good on writing, can generate very realistic photos, can create memes, and looks like hands problem is fixed now.
By @Alifatisk - 5 months
You know what’s also impressive besides this beta release? How Claude 3.5 Sonnet is still able to keep up so well. Grok-2 beat every other LLM except Claude. How did Anthropic achieve this?
By @clarionbell - 5 months
I don't really care. The model may be competitive, but my use cases require speed, local (semi-local) execution and reliability. Neither of these seem to be baked into whatever X produced now.

When they make the mini model available for download and quantizable. That's when I may be interested. But given the minimal improvement in the past several months, I'm inclined to believe that we have reached the plateau.

By @miki123211 - 5 months
Do we have any info on this model's balance of censorship versus safety?

This is Musk after all, so I wouldn't be surprised if it strayed far from the norm.

By @ssijak - 5 months
Oh this is great, one more competitor with top model which will be available via API. I wonder what the pricing will be. OpenAI was slashing prices multiple times in the last year and a half I was using it.
By @QuesnayJr - 5 months
I have seen sus-column-r on LMSYS a bunch of times. It seemed pretty good, though not as good as the best Google, Anthropic, or OpenAI models.

I'm surprised they managed to catch up. I guess there really is no moat.

By @sagivo - 5 months
Putting a new tool for developers behind an "enterprise API" gate is a sure way to kill it
By @mmastrac - 5 months
"Our AI Tutors engage with our models across a variety of tasks that reflect real-world interactions with Grok. During each interaction, the AI Tutors are presented with two responses generated by Grok"

My guess is that they're using one of the third party AI training outfits for this and that they are paying through the nose.

This looks exactly like a training task I got to see on one of those platforms.

By @mupuff1234 - 5 months
So all models seem to converge to a similar level of performance - is this the end of the line for LLMs?
By @infotainment - 5 months
I’m hoping we’ll see an open release of this in 6 months or so, as we saw with Grok-1.

I’m not hugely optimistic, though.

By @pantsforbirds - 5 months
If the X.AI team is able to build out a good enough model with access to real-time tweets, they could have an incredible product. I'd love to be ask about current events and get really strong results back based on tweets + community notes.
By @vlaaad - 5 months
But when will it be available in my region (Europe)?
By @jatins - 5 months
If I am reading the table correctly they are claiming it is better than all models but 3.5-Sonnet

Is anyone with X premium able to confirm the vibe check -- Is the model actually good or another case of training on benchmarks?

By @lucasRW - 5 months
Glad to see an uncensored AI able to compete with the other models.
By @ComputerGuru - 5 months
Interesting that they’re rolling this out to Twitter/X Premium users, it was previously the biggest differentiator between Premium+ haves and Premium have-nots.
By @Havoc - 5 months
Seems like a solid result & more competition is always better.

That said I’m still cheering for mistral and meta with their more open stance

By @ZacnyLos - 5 months
Twitter started irreversibly feeding users’ data into its “Grok” AI technology in May 2024, without ever informing them or asking for their consent.

https://noyb.eu/en/twitters-ai-plans-hit-9-more-gdpr-complai...

By @ayakang31415 - 5 months
Can anyone tell me how much censorship grok has? I hate that many other LLMs have too much censorship.
By @machdiamonds - 5 months
Pretty funny to read the comments from xAI's initial announcement now.

https://news.ycombinator.com/item?id=36696473

By @worstspotgain - 5 months
We need an alt-right version of AI like we need a pumpkin spice sushiccino. No thanks but no thanks.
By @CMLab - 5 months
When github release?
By @ionwake - 5 months
Guys come on, you cant keep releasing software in the US, and then do a staggered launch where months later things are available to users in England, Denmark etc. There should be no reason for it. Im sure whatever dumb EU regulations can be dealt with easily in the software, these staggered releases ( such as chagpt having no Memory etc MONTHS down the line for EU users ) its just a hindrance to progress. Its starting to feel like we live on an island in the middle of nowhere.