December 11th, 2024

Gemini 2.0: our new AI model for the agentic era

Google has launched Gemini 2.0, an advanced AI model with multimodal capabilities, including image and audio output. Wider access is planned for early 2025, focusing on responsible AI development.

Read original articleLink Icon
ExcitementSkepticismConfusion
Gemini 2.0: our new AI model for the agentic era

Google has unveiled Gemini 2.0, an advanced AI model designed for the "agentic era," which emphasizes enhanced capabilities in multimodal processing. This new model features native image and audio output, as well as the ability to utilize various tools, making it more versatile than its predecessors. Gemini 2.0 Flash, the experimental version, is currently available to developers and trusted testers, with broader access expected in early 2025. The model aims to facilitate agentic experiences through projects like Astra and Mariner, which explore the potential of AI assistants in everyday tasks. Google emphasizes its commitment to responsible AI development, prioritizing safety and security. The Gemini 2.0 model builds on the success of earlier versions, enhancing performance and response times while supporting a range of inputs and outputs. The introduction of a new Multimodal Live API aims to assist developers in creating interactive applications. Overall, Gemini 2.0 represents a significant step forward in AI technology, with the potential to transform user interactions across various Google products.

- Google has launched Gemini 2.0, an AI model focused on multimodal capabilities.

- The model includes features like native image and audio output and tool usage.

- Gemini 2.0 Flash is available to developers, with wider access planned for early 2025.

- Projects like Astra and Mariner are being developed to enhance AI assistant functionalities.

- Google is committed to responsible AI development, emphasizing safety and security.

AI: What people are saying
The launch of Google Gemini 2.0 has generated a variety of reactions and discussions among users.
  • Many users are impressed with Gemini 2.0's multimodal capabilities, particularly its ability to engage in live conversations and assist with tasks like coding and image recognition.
  • There are concerns about the model's tendency to hallucinate and provide inaccurate information, especially in search results.
  • Users are debating the effectiveness of Gemini 2.0 compared to competitors like OpenAI's models, with mixed opinions on its performance and usability.
  • Some comments highlight the excitement around new features like the Deep Research tool, while others express skepticism about Google's ability to monetize AI without affecting its core advertising business.
  • The naming of the model has sparked confusion and criticism, with some users questioning the choice in relation to existing protocols.
Link Icon 84 comments
By @modeless - 4 months
https://aistudio.google.com/live is by far the coolest thing here. You can just go there and share your screen or camera and have a running live voice conversation with Gemini about anything you're looking at. As much as you want, for free.

I just tried having it teach me how to use Blender. It seems like it could actually be super helpful for beginners, as it has decent knowledge of the toolbars and keyboard shortcuts and can give you advice based on what it sees you doing on your screen. It also watched me play Indiana Jones and the Great Circle, and it successfully identified some of the characters and told me some information about them.

You can enable "Grounding" in the sidebar to let it use Google Search even in voice mode. The video streaming and integrated search make it far more useful than ChatGPT Advanced Voice mode is currently.

By @simonw - 4 months
I released a new llm-gemini plugin with support for the Gemini 2.0 Flash model, here's how to use that in the terminal:

    llm install -U llm-gemini
    llm -m gemini-2.0-flash-exp 'prompt goes here'
LLM installation: https://llm.datasette.io/en/stable/setup.html

Worth noting that the Gemini models have the ability to write and then execute Python code. I tried that like this:

    llm -m gemini-2.0-flash-exp -o code_execution 1 \
      'write and execute python to generate a 80x40 ascii art fractal'
Here's the result: https://gist.github.com/simonw/0d8225d62e8d87ce843fde471d143...

It can't make outbound network calls though, so this fails:

    llm -m gemini-2.0-flash-exp  -o code_execution 1 \
      'write python code to retrieve https://simonwillison.net/ and use a regex to extract the title, run that code'
Amusingly Gemini itself doesn't know that it can't make network calls, so it tries several different approaches before giving up: https://gist.github.com/simonw/2ccfdc68290b5ced24e5e0909563c...

The new model seems very good at vision:

    llm -m gemini-2.0-flash-exp describe -a https://static.simonwillison.net/static/2024/pelicans.jpg
I got back a solid description, see here: https://gist.github.com/simonw/32172b6f8bcf8e55e489f10979f8f...
By @crowcroft - 4 months
Big companies can be slow to pivot, and Google has been famously bad at getting people aligned and driving in one direction.

But, once they do get moving in the right direction the can achieve things that smaller companies can't. Google has an insane amount of talent in this space, and seems to be getting the right results from that now.

Remains to be seen how well they will be able to productize and market, but hard to deny that their LLM models aren't really, really good though.

By @serjester - 4 months
Buried in the announcement is the real gem — they’re releasing a new SDK that actually looks like it follows modern best practices. Could be a game-changer for usability.

They’ve had OpenAI-compatible endpoints for a while, but it’s never been clear how serious they were about supporting them long-term. Nice to see another option showing up. For reference, their main repo (not kidding) recommends setting up a Kubernetes cluster and a GCP bucket to submit batch requests.

[1]https://github.com/googleapis/python-genai

By @bradhilton - 4 months
Beats Gemini 1.5 Pro at all but two of the listed benchmarks. Google DeepMind is starting to get their bearings in the LLM era. These are the minds behind AlphaGo/Zero/Fold. They control their own hardware destiny with TPUs. Bullish.
By @airstrike - 4 months
OT: I’m not entirely sure why, but "agentic" sets my teeth on edge. I don't mind the concept, but the word itself has that hollow, buzzwordy flavor I associate with overblown LinkedIn jargon, particularly as it is not actually in the dictionary...unlike perfectly serviceable entries such as "versatile", "multifaceted" or "autonomous"
By @losvedir - 4 months
This naming is confusing...

Anyway, I'm glad that this Google release is actually available right away! I pay for Gemini Advanced and I see "Gemini Flash 2.0" as an option in the model selector.

I've been going through Advent of Code this year, and testing each problem with each model (GPT-4o, o1, o1 Pro, Claude Sonnet, Opus, Gemini Pro 1.5). Gemini has done decent, but is probably the weakest of the bunch. It failed (unexpectedly to me) on Day 10, but when I tried Flash 2.0 it got it! So at least in that one benchmark, the new Flash 2.0 edged out Pro 1.5.

I look forward to seeing how it handles upcoming problems!

I should say: Gemini Flash didn't quite get it out of the box. It actually had a syntax error in the for loop, which caused it to fail to compile, which is an unusual failure mode for these models. Maybe it was a different version of Java or something (I'm also trying to learn Java with AoC this year...). But when I gave Flash 2.0 the compilation error, it did fix it.

For the more Java proficient, can someone explain why it may have provided this code:

     for (int[] current = queue.remove(0)) {
which was a compilation error for me? The corrected code it gave me afterwards was just

     for (int[] current : queue) { 
and with that one change the class ran and gave the right solution.
By @og_kalu - 4 months
The Gemini 2 models support native audio and image generation but the latter won't be generally available till January. Really excited for that as well as 4o's image generation (whenever that comes out). Steerability has lagged behind aesthetics in image generation for a while now and it's be great to see a big advance in that.

Also a whole lot of computer vision tasks (via LLMs) could be unlocked with this. Think Inpainting, Style Transfer, Text Editing in the wild, Segmentation, Edge detection etc

They have a demo: https://www.youtube.com/watch?v=7RqFLp0TqV0

By @siliconc0w - 4 months
What's everyone's favorite LLM leaderboard? Gemini 2 seems to be edging out 4o on chatbot arena(https://lmarena.ai/?leaderboard)
By @jncfhnb - 4 months
Am I alone in thinking the word “agentic” is dumb as shit?

Most of these things seem to just be a system prompt and a tool that get invoked as part of a pipeline. They’re hardly “agents”.

They’re modules.

By @ofermend - 4 months
Gemini-2.0-Flash does extremely well on the Hallucination Evaluation Leaderboard, at 1.3% hallucination rate https://github.com/vectara/hallucination-leaderboard
By @tkgally - 4 months
I tried accessing Gemini 2.0 Flash through Google AI Studio in the Safari browser on my iPhone, and to my surprise it worked. After I gave it access to my microphone and camera, I was able to have a pretty smooth conversation with it about what it saw through the camera. I pointed the camera at things in my room and asked what they were, and it identified them accurately. It was also able to read text in both English and Japanese. It correctly named a note I played on a piano when I showed it the keyboard with my finger playing the note, but it couldn’t identify notes by sound alone.

The latency was low, though the conversation got cut off a few times.

By @ComputerGuru - 4 months
I've been using gemini-exp-1206 and I notice a lot of similarities to the new gemini-2.0-flash-exp: they're not that much actually smarter but they go out of their way to convince you they are with overly verbose "reasoning" and explanations. The reasoning and explanations aren't necessarily wrong per se, but put them aside and focus on the actual logical reasoning steps and conclusions to your prompts and it's still very much a dumb model.

The models do just fine on "work" but are terrible for "thinking". The verbosity of the explanations (and the sheer amount of praise the models like to give the prompter - I've never had my rear end kissed so much!) should lead one to beware any subjective reviews of their performance rather than objective reviews focusing solely on correct/incorrect.

By @EternalFury - 4 months
Think of Google as of a tanker ship. It takes a while to change course, but it has great momentum. Sundar just needs to make sure the course is right.
By @mherrmann - 4 months
Their Mariner tool for controlling the browser sounds scary and exciting. At the moment, it's an extension, which means JavaScript. Some web sites block automation that happens this way, and developers resort to tools such as Selenium. These use the Chrome DevTools API to automate the browser. It's better, but can still be distinguished from normal use with very technical details. I wonder if Google, who still own Chrome, will give extensions better APIs for automation that can not be distinguished from normal use.
By @mike80it - 4 months
Gemini 1120, 1206, and Gemini 2.0 flash have better coding results than ChatGPT o1 and Claude Sonnet 3.5.

They did it: from now on Google will keep a leadership position.

They have too much data (Search, Maps, Youtube, Chrome, Android, Gmail, etc.), and they have their own servers (it's free!) and now the Willow QPU.

To me, it is evident how the future will look. I'll buy some more Alphabet stocks

By @smallerfish - 4 months
Was this written by an LLM? It's pretty bad copy. Maybe they laid off their copywriting team...?

> "Now millions of developers are building with Gemini. And it’s helping us reimagine all of our products — including all 7 of them with 2 billion users — and to create new ones"

and

> "We’re getting 2.0 into the hands of developers and trusted testers today. And we’re working quickly to get it into our products, leading with Gemini and Search. Starting today our Gemini 2.0 Flash experimental model will be available to all Gemini users."

By @PaulWaldman - 4 months
Anecdotally, using the Gemini App with "Gemini Advanced 2.0 Flash Experimental", the response quality is ignorantly improved and faster at some basic Python and C# generation.
By @epolanski - 4 months
I'm not gonna lie I like Google's models.

Flash combines speed and cost and is extremely good to build apps on.

People really take that whole benchmarking thing more seriously than necessary.

By @weatherlite - 4 months
Google is on fire lately, stock went up considerably in the last few days and rightly so; besides the usual cash cows (search, youtube) they made huge progress with GCP, Gemini and quite possibly Waymo. They have a fantastic momentum imo.
By @CSMastermind - 4 months
> We're also launching a new feature called Deep Research, which uses advanced reasoning and long context capabilities to act as a research assistant, exploring complex topics and compiling reports on your behalf. It's available in Gemini Advanced today.

Anyone seeing this? I don't have an option in my dropdown.

By @nycdatasci - 4 months
Gemini 2.0 Flash is available here: https://aistudio.google.com/prompts/new_chat

Based on initial interactions, it's extremely verbose. It seems to be focused on explaining its reasoning, but even after just a few interactions I have seen some surprising hallucinations. For example, to assess current understanding of AI, I mentioned "Why hasn't Anthropic released Claude 3.5 Opus yet?" Gemini responded with text that included "Why haven't they released Claude 3.5 Sonnet First? That's an interesting point." There's clearly some reflection/attempted reasoning happening, but it doesn't feel competitive with o1 or the new Claude 3.5 Sonnet that was trained on 3.5 Opus output.

By @jerpint - 4 months
We’re definitely going to need better benchmarks for agentic tasks, and not just code reasoning. Things that are needlessly painful that humans go through all the time
By @brokensegue - 4 months
Any word on price? I can't find it at https://ai.google.dev/pricing
By @nightski - 4 months
Anyone else annoyed how the ML/AI community just adopted the word "reasoning" when it seems like it is being used very out of context when looking at what the model actually does?
By @jacooper - 4 months
The best thing about gemini models is the huge context windows, you can just throw big documents and find stuff real fast, rather than struggling with cut off in perplexity or Claude.
By @dandiep - 4 months
Gemini multimodal live docs here: https://cloud.google.com/vertex-ai/generative-ai/docs/model-...

A little thin...

Also no pricing is live yet. OpenAI's audio inputs/outputs are too expensive to really put in production, so hopefully Gemini will be cheaper. (Not to mention, OAI's doesn't follow instructions very well.)

By @mfonda - 4 months
Does anyone have any insights into how Google selects source material for AI overviews? I run an educational site with lots of excellent information, but it seems to have been passed over entirely for AI overviews. With these becoming an increasingly large part of search--and from the sound of it, now more so with Gemini 2.0--this has me a little worried.

Anyone else run into similar issues or have any tips?

By @irthomasthomas - 4 months
I wrote about a possibly emergent behaviour in Gemini 2 that was only seen previously in sonnet-3.5 https://x.com/xundecidability/status/1867044846839431614
By @gotaran - 4 months
Google beat OpenAI at their own game.
By @sorenjan - 4 months
Published the day after one of the authors, Demis Hassabis, received his Nobel prize in Stockholm.
By @computergert - 4 months
It's funny how even the latest AI models struggle with simple questions.

"What's the first name of Freddy LaStrange"? >> "I do not have enough information about that person to help with your request. I am a large language model, and I am able to communicate and generate human-like text in response to a wide range of prompts and questions, but my knowledge about this person is limited. Is there anything else I can do to help you with this request?"

(Of course, we can't be 100% sure that his first name is Freddy. But I would expect that to be part of the answer then)

By @fuddle - 4 months
I'd be interested to see Gemini 2.0's performance on SWE-Bench.
By @strongpigeon - 4 months
Did anyone get to play with the native image generation part? In my experience, Imagen 3 was much better than the competition so I'm curious to hear people's take on this one.
By @s3p - 4 months
This might be a dumb thought, but the naming of this model naturally suggests we have a Pro model coming out soon. Similar to Claude 3.5 Sonnet, where they never mentioned the Opus model afterward, does anyone think this is all we will see from Google in terms of the 2.0 model family? Does anyone think there will be some 2.0 Pro model that will blow this out of the water? The naming makes it seem so.
By @SubiculumCode - 4 months
Considering so many of us would like more vRAM than NVIDIA is giving us for home compute, is there any future where these Trillium TPUs become commodity hardware?
By @nuz - 4 months
I guess this means we'll have an openai release soon
By @fpgaminer - 4 months
I work with LLMs and MLLMs all day (as part of my work on JoyCaption, an open source VLM). Specifically, I spend a lot of time interacting with multiple models at the same time, so I get the chance to very frequently compare models head-to-head on real tasks.

I'll give Flash 2 a try soon, but I gotta say that Google has been doing a great job catching up with Gemini. Both Gemini 1.5 Pro 002 and Flash 1.5 can trade blows with 4o, and are easily ahead of the vast majority of other major models (Mistral Large, Qwen, Llama, etc). Claude is usually better, but has a major flaw (to be discussed later).

So, here's my current rankings. I base my rankings on my work, not on benchmarks. I think benchmarks are important and they'll get better in time, but most benchmarks for LLMs and MLLMs are quite bad.

1) 4o and its ilk are far and away the best in terms of accuracy, both for textual tasks as well as vision related tasks. Absolutely nothing comes even close to 4o for vision related tasks. The biggest failing of 4o is that it has the worst instruction following of commercial LLMs, and that instruction following gets _even_ worse when an image is involved. A prime example is when I ask 4o to help edit some text, to change certain words, verbage, etc. No matter how I prompt it, it will often completely re-write the input text to its own style of speaking. It's a really weird failing. It's like their RLHF tuning is hyper focused on keeping it aligned with the "character" of 4o to the point that it injects that character into all its outputs no matter what the user or system instructions state. o1 is a MASSIVE improvement in this regard, and is also really good at inferring things so I don't have to explicitly instruct it on every little detail. I haven't found o1-pro overly useful yet. o1 is basically my daily driver outside of work, even for mundane questions, because it's just better across the board and the speed penalty is negligible. One particularly example of o1 being better I encountered yesterday. I had it re-wording an image description, and thought it had introduced a detail that wasn't in the original description. Well, I was wrong and had accidentally skimmed over that detail in the original. It _told_ me I was wrong, and didn't update the description! Freaky, but really incredible. 4o never corrects me when I give it an explicit instruction.

4o is fairly easy to jailbreak. They've been turning the screws for awhile so it isn't as easy as day 1, but even o1-pro can be jailbroken.

2) Gemini 1.5 Pro 002 (specifically 002) is second best in my books. I'd guesstimate it at being about 80% as good as 4o on most tasks, including vision. But it's _significantly_ better at instruction following. Its RLHF is a lot lighter than ChatGPT models, so it's easier to get these models to fall back to pretraining, which is really helpful for my work specifically. But in general the Gemini models have come a long way. The ability to turn off model censorship is quite nice, though it does still refuse at times. The Flash variation is interesting; often times on-par with Pro with Pro edging out maybe 30% of the time. I don't frequently use Flash, but it's an impressive model for its size. (Side note: The Gemma models are ... not good. Google's other public models, like so400m and OWLv2 are great, so it's a shame their open LLMs forays are falling behind). Google also has the best AI playground.

Jailbreaking Gemini is a piece of cake.

3) Claude is third on my list. It has the _best_ instruction following of all the models, even slightly better than o1. Though it often requires multi-turn to get it to fully follow instructions, which is annoying. Its overall prowess as an LLM is somewhere between 4o and Gemini. Vision is about the same as Gemini, except for knowledge based queries which Gemini tends to be quite bad at (who is this person? Where is this? What brand of guitar? etc). But Claude's biggest flaw is the insane "safety" training it underwent, which makes it practically useless. I get false triggers _all_ the time from Claude. And that's to say nothing of how unethical their "ethics" system is to begin with. And what's funny is that Claude is an order of magnitude _smarter_ when its reasoning about its safety training. It's the only real semblance of reason I've seen from LLMs ... all just to deny my requests.

I've put Claude three out of respect for the _technical_ achievements of the product, but I think the developers need to take a long look in the mirror and ask why they think it's okay to for _them_ to decide what people with disabilities are and are not aloud to have access to.

4) Llama 3. What a solid model. It's the best open LLM, hands down. Nowhere near the commercial models above, but for a model that's completely free to use locally? That's invaluable. Their vision variation is ... not worth using. But I think it'll get better with time. The 8B variation far outperforms its weight class. 70B is a respectable model, with better instruction following than 4o. The ability to finetune these models to a task with so little data is a huge plus. I've made task specific models with 200-400 examples.

5) Mistral Large (I forget the specific version for their latest release). I love Mistral as the "under-dog". Their models aren't bad, and behave _very_ differently from all other models out there, which I appreciate. But Mistral never puts any effort into polishing their models; they always come out of the oven half-baked. Which means they frequently glitch out, have very inconsistent behavior, etc. Accuracy and quality is hard to assess because of this inconsistency. On its best days it's up near Gemini, which is quite incredible considering the models are also released publicly. So theoretically you could finetune them to your task and get a commercial grade model to run locally. But rarely see anyone do that with Mistral, I think partly because of their weird license. Overall, I like seeing them in the race and hope they get better, but I wouldn't use it for anything serious.

Mistral is lightly censored, but fairly easy to jailbreak.

6) Qwen 2 (or 2.5 or whatever the current version is these days). It's an okay model. I've heard a lot of praises for it, but in all my uses thus far its always been really inconsistent, glitchy, and weak. I've used it both locally and through APIs. I guess in _theory_ it's a good model, based on benchmarks. And it's open, which I appreciate. But I've not found any practical use for it. I even tried finetuning with Qwen 2VL 72B, and my tiny 8B JoyCaption model beat it handily.

That's about the sum of it. AFAIK that's all the major commercial and open models (my focus is mainly on MLLMs). OpenAI are still leading the pack in my experience. I'm glad to see good competition coming from Google finally. I hope Mistral can polish their models and be a real contender.

There are a couple smaller contenders out there like Pixmo/etc from allenai. Allen AI has hands down the _best_ public VQA dataset I've seen, so huge props to them there. Pixmo is ... okayish. I tried Amazon's models a little but didn't see anything useful.

NOTE: I refuse to use Grok models for the obvious reasons, so fucks to be them.

By @jamesponddotco - 4 months
At least when it comes to Go code, I'm pretty impressed by the results so far. It's also pretty good at following directions, which is a problem I have with open source models, and seems to use or handle Claude's XML output very well.

Overall, especially seeing as I haven't paid a dime to use the API yet, I'm pretty impressed.

By @anothernewdude - 4 months
Agents are the worst idea. I think AI will start to progress again when we get better models that drop this whole chat idea, and just focus on completions. Building the tools on top should be the work that is given over to the masses. It's sad that instead the most powerful models have been hamstrung.
By @bluelightning2k - 4 months
Does anyone know how to sign up to the speech output wait list or tester program? I have a decent spent with GCP over the years if that helps at all. Really want DemoTime videos to use those voices. (I like how truly incredible best in the world tts is like a footnote in this larger announcement.)
By @zb3 - 4 months
Is this the gemini-exp model on LMArena?
By @ianbutler - 4 months
Unfortunately the 10rpm quota for this experimental model isn't enough to run an actual Agentic experience on.

That's my main issue with google there's several models we want to try with our agent but quota is limited and we have to jump through hoops to see if we can get it raised.

By @AJRF - 4 months
I think they are really overloading that word "Agent". I know there isn't a standard definition - but I think Google are stretching the meaning of that way thinner than most C Suite level execs talk about agents at.

I think DeepMind could make progress if they focused on the agent definition of multi-step reasoning + action through a web browser, and deliver a ton of value, outside of lumping in the seldom used "Look at the world through a camera" or "Multi modal Robots" thing.

If Google cracked robots, past plays show that the market for those aren't big enough to interest Google. Like VR, you just can't get a billion people to be interested in robots - so even if they make progress, it won't survive under Google.

The "Look at the world through a camera" thing is a footnote in an Android release.

Agentic computer use _is_ a product a billion people would use, and it's adjacent to the business interests of Google Search.

By @petesergeant - 4 months
Speed looks good vis-a-vis 4o-mini, and quality looks good so far against my eval set. If it's cheaper than 4o-mini too (which, it probably will be?) then OpenAI have a real problem, because switching between them is a value in a config file.
By @thisoneworks - 4 months
"gemini for video games" - here we go again with the AI does the interesting stuff for you rather than the boring stuff
By @nuancebydefault - 4 months
I'm quite impressed with the flash demo's reasoning capabilities. Did the 20 questions game with it, it found the computer mouse i had in my head. At first it was confused about our roles and said something weird, it thought that it had to guess its own word. Afterwards I had a meta conversation about that weirdness and it gave impressive insights:

" Why My "I Guess What I Have in Mind" Statement Doesn't Make Sense

    The Problem of Agency: You're spot-on when you say it doesn't make sense for me to state "I guess what I (the computer) have in mind." AI does not have a "mind" in the same way that humans do. My internal state is a complex pattern of numerical data, not conscious thoughts or intentions in the way you have things in mind. My phrasing gave the illusion of having a "thing in mind," but in reality, I don't have this capacity.

    Misinterpretation of the Task: My statement "I guess what I have in mind" misinterprets the task. You were clearly signaling that I needed to ask questions to uncover what you were thinking of. My failure to grasp this created a weird scenario where it seemed like I was trying to determine my own data set!"
By @NoZZz - 4 months
Ugh. You're investing in this, but it disenfranchises the people that are investing in you. Well. Unless you'd like to admit that your income is pure fantasy, which it is.
By @r33b33 - 4 months
Can it work with large codebase? How? I have an Xcode project with python server for online game. Is this sufficient to understand code fully and implement edits correctly?
By @summerlight - 4 months
It is interesting to see that they keep focusing on the cheapest model instead of the frontier model. Probably because of their primary (internal?) customer's need?
By @ipsum2 - 4 months
Tested out Gemini-2 Flash, I had such high hopes that a better base model would help. It still hallucinates like crazy compared to GPT-4o.
By @dangoodmanUT - 4 months
Jules looks like it's going after Devin
By @Animats - 4 months
"Over the last year, we have been investing in developing more agentic models, meaning they can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision."

"With your supervision". Thus avoiding Google being held responsible. That's like Teslas Fake Self Driving, where the user must have their hands on the wheel at all times.

By @aantix - 4 months
Their offering is just so... bad. Even the new model. All the data in the world, yet they trail behind.

They have all of these extensions that they use to prop up the results in the web UI.

I was asking for a list of related YouTube videos - the UI returns them.

Ask the API the same prompt, it returns a bunch of made up YouTube titles and descriptions.

How could I ever rely on this product?

By @geodel - 4 months
Just searched for GVP vs SVP and got:

"GVP stands for Good Pharmacovigilance Practice, which is a set of guidelines for monitoring the safety of drugs. SVP stands for Senior Vice President, which is a role in a company that focuses on a specific area of operations."

Seems lot of pharma regulation in my telecom company.

By @stared - 4 months
How does it compare to OpenAI and Anthropic models, on the same benchmarks?
By @topicseed - 4 months
Is it on AI studio already?
By @jgalt212 - 4 months
Can cloudflare turnstile (and others) detect these agents as bots?
By @greenchair - 4 months
shouldn't there be a benchmark category called 'strawberry' by now? or maybe these AIs are still too dumb to handle it which is why it is left off?
By @nopcode - 4 months
> What can you tell me about this sculpture?

> It's located in London.

Mind blowing.

By @beepbooptheory - 4 months
We are moving through eras faster than years these days.
By @eichi - 4 months
Gemini needs to improve quality of the software.
By @MetroWind - 4 months
Yes, "agentic". Very creative word.
By @sachou - 4 months
What is the main difference compare to before?
By @katamari-damacy - 4 months
Is it better than GPT4o? Does it have an API?
By @chrsw - 4 months
Instead of throwing up tables of benchmarks just let me try to do stuff and see if it's useful.
By @wonderfuly - 4 months
By @echohack5 - 4 months
Is this what it feels like to become one of the gray bearded engineers? This sounds like a bunch of intentionally confusing marketing drivel.

When capitalism has pilfered everything from the pockets of working people so people are constantly stressed over healthcare and groceries, and there's little left to further the pockets of plutocrats, the only marketing that makes sense is to appeal to other companies in order to raid their coffers by tricking their Directors to buy a nonsensical product.

Is that what they mean by "agentic era"? Cause that's what it sounds like to me. Also smells alot like press release driven development where the point is to put a feather in the cap of whatever poor Google engineer is chasing their next promotion.

By @paradite - 4 months
Honestly this post makes Google sound like the new IBM. Very corporate.

"Hear from our CEO first, and then our other CEO in charge of this domain and CTO will tell you the actual news."

I haven't seen other tech companies write like that.

By @xyst - 4 months
Side note on Gemini: I pay for Google Workspace simply to enable e-mail capability for a custom domain.

I never used the web interface to access email until recently. To my surprise, all of the AI shit is enabled by default. So it’s very likely Gemini has been training on private data without my explicit consent.

Of course G words it as “personalizing” the experience for me but it’s such a load of shit. I’m tired of these companies stealing our data and never getting rightly compensated.

By @jstummbillig - 4 months
Reminder that implied models are not actual models. Models have failed to materialize repeatedly and vanished without further mention. I assume no one is trying to be misleading but, at this point, maybe overly optimistic.
By @oldpersonintx - 4 months
the demos are amazing

I need to rewire my brain for the power of these tools

this plus the quantum stuff...Google is on a win streak

By @moralestapia - 4 months
>2,000 words of bs

>General availability will follow in January, along with more model sizes.

>Benchmarks against their own models which always underperformed

>No pricing visible anywhere

Completely inept leadership at play.

By @m3kw9 - 4 months
Can these guy lead for once? They are always responding to what OpenAI is doing.
By @transcriptase - 4 months
“Hey google turn on kitchen lights”

“Sure, playing don’t fear the reaper on bathroom speaker”

Ok

By @ryandvm - 4 months
I am sure Google has the resources to compete in this space. What I'm less sure about is whether Google can monetize AI in a way that doesn't cannibalize their advertising income.

Who the hell wants an AI that has the personality of a car salesman?

By @melvinmelih - 4 months
No mention of Perplexity yet in the comments but it's obvious to me that they're targeting Perplexity Pro directly with their new Deep Research feature (https://blog.google/products/gemini/google-gemini-deep-resea...). I still wonder why Perplexity is worth $7 billion when the 800-pound gorilla is pounding on their door (albeit slowly).
By @echelon - 4 months
Gemini in search is answering so many of my search questions wrong.

If I ask natural language yes/no questions, Gemini sometimes tells me outright lies with confidence.

It also presents information as authoritative - locations, science facts, corporate ownership, geography - even when it's pure hallucination.

Right at the top of Google search.

edit:

I can't find the most obnoxious offending queries, but here was one I performed today: "how many islands does georgia have?".

Compare that with "how many islands does georgia have? Skidaway Island".

This is an extremely mild case, but I've seen some wildly wrong results, where Google has claimed companies were founded in the wrong states, that towns were located in the wrong states, etc.

By @tpoacher - 4 months
I know this isn't really a useful comment, but, I'm still sour about the name they chose. They MUST have known about the Gemini protocol. I'm tempted to think it was intentional, even.

It's like Microsoft creating an AI tool and calling it Peertube. "Hurr durr they couldn't possibly be confused; one is a decentralised video platform and the other is an AI tool hurr durr. And ours is already more popular if you 'bing' it hurr durr."