March 25th, 2025

Gemini 2.5: Our most intelligent AI model

Google has launched Gemini 2.5, its most advanced AI model, excelling in reasoning and coding, with a 1 million token context window, available for developers in Google AI Studio.

Read original article

ImpressionExcitementSkepticism

Gemini 2.5: Our most intelligent AI model

Gemini 2.5 has been introduced as Google's most advanced AI model, designed to address complex problems with enhanced reasoning capabilities. The initial release, Gemini 2.5 Pro Experimental, has achieved top rankings on various benchmarks, demonstrating superior reasoning and coding skills. This model is capable of analyzing information, drawing logical conclusions, and making informed decisions, which marks a significant advancement in AI technology. Gemini 2.5 Pro excels in math and science benchmarks and is noted for its ability to create visually appealing web applications and perform code transformations. It features a 1 million token context window, allowing it to process extensive datasets and handle multifaceted problems across different media types. Developers can access Gemini 2.5 Pro through Google AI Studio, with plans for broader availability on Vertex AI. Feedback from users is encouraged to further enhance the model's capabilities.

- Gemini 2.5 is Google's most intelligent AI model, focusing on complex problem-solving.

- The model showcases advanced reasoning and coding capabilities, leading in various benchmarks.

- It features a 1 million token context window for processing large datasets.

- Developers can experiment with Gemini 2.5 Pro in Google AI Studio, with wider availability coming soon.

- User feedback is welcomed to improve the model's performance and features.

Gemini's data-analyzing abilities aren't as good as Google claims

Google's Gemini 1.5 Pro and 1.5 Flash AI models face scrutiny for poor data analysis performance, struggling with large datasets and complex tasks. Research questions Google's marketing claims, highlighting the need for improved model evaluation.

Gemini Pro 1.5 experimental "version 0801" available for early testing

Google DeepMind's Gemini family of AI models, particularly Gemini 1.5 Pro, excels in multimodal understanding and complex tasks, featuring a two million token context window and improved performance in various benchmarks.

Google Gemini 1.5 Pro leaps ahead in AI race, challenging GPT-4o

Google has launched Gemini 1.5 Pro, an advanced AI model excelling in multilingual tasks and coding, now available for testing. It raises concerns about AI safety and ethical use.

Gemini 2.0: our new AI model for the agentic era

Google has launched Gemini 2.0, an advanced AI model with multimodal capabilities, including image and audio output. Wider access is planned for early 2025, focusing on responsible AI development.

Gemini 2.0 is now available to everyone

Google has launched Gemini 2.0, featuring the Flash model for all users, a Pro model for coding, and a cost-efficient Flash-Lite model, all with enhanced safety measures and ongoing updates.

AI: What people are saying

The launch of Google's Gemini 2.5 has generated a mix of excitement and skepticism among users.

Many users praise Gemini 2.5 for its improved reasoning and coding capabilities, with several noting successful benchmarks in complex tasks.
Some users express concerns about the model's performance, citing issues like hallucinations and limitations in understanding recent developments.
There is a notable interest in the model's long context window, with users testing it on extensive datasets and finding it effective.
Comments reflect a desire for better user experience and interface improvements, comparing it unfavorably to competitors like OpenAI.
Users are curious about the model's knowledge cutoff and its implications for practical applications in various fields.

79 comments

By @og_kalu - 12 days

One of the biggest problems with hands off LLM writing (for long horizon stuff like novels) is that you can't really give them any details of your story because they get absolutely neurotic with it.

Imagine for instance you give the LLM the profile of the love interest for your epic fantasy, it will almost always have the main character meeting them within 3 pages (usually page 1) which is of course absolutely nonsensical pacing. No attempt to tell it otherwise changes anything.

This is the first model that after 19 pages generated so far resembles anything like normal pacing even with a TON of details. I've never felt the need to generate anywhere near this much. Extremely impressed.

Edit: Sharing it - https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

with pastebin - https://pastebin.com/aiWuYcrF

By @malisper - 13 days

I've been using a math puzzle as a way to benchmark the different models. The math puzzle took me ~3 days to solve with a computer. A math major I know took about a day to solve it by hand.

Gemini 2.5 is the first model I tested that was able to solve it and it one-shotted it. I think it's not an exaggeration to say LLMs are now better than 95+% of the population at mathematical reasoning.

For those curious the riddle is: There's three people in a circle. Each person has a positive integer floating above their heads, such that each person can see the other two numbers but not his own. The sum of two of the numbers is equal to the third. The first person is asked for his number, and he says that he doesn't know. The second person is asked for his number, and he says that he doesn't know. The third person is asked for his number, and he says that he doesn't know. Then, the first person is asked for his number again, and he says: 65. What is the product of the three numbers?

By @simonw - 12 days

I'm impressed by this one. I tried it on audio transcription with timestamps and speaker identification (over a 10 minute MP3) and drawing bounding boxes around creatures in a complex photograph and it did extremely well on both of those.

Plus it drew me a very decent pelican riding a bicycle.

Notes here: https://simonwillison.net/2025/Mar/25/gemini/

By @freediver - 12 days

Tops our benchmark in an unprecedented way.

https://help.kagi.com/kagi/ai/llm-benchmark.html

High quality, to the point. Bit on the slow side. Indeed a very strong model.

Google is back in the game big time.

By @anotherpaulg - 12 days

Gemini 2.5 Pro set the SOTA on the aider polyglot coding leaderboard [0] with a score of 73%.

This is well ahead of thinking/reasoning models. A huge jump from prior Gemini models. The first Gemini model to effectively use efficient diff-like editing formats.

[0] https://aider.chat/docs/leaderboards/

By @Oras - 13 days

These announcements have started to look like a template.

- Our state-of-the-art model.

- Benchmarks comparing to X,Y,Z.

- "Better" reasoning.

It might be an excellent model, but reading the exact text repeatedly is taking the excitement away.

By @greatgib - 12 days

If you plan to use Gemini, be warned, here are the usual Big Tech dragons:

   Please don’t enter ...confidential info or any data... you wouldn’t want a reviewer to see or Google to use ...

The full extract of the terms of usage:

   How human reviewers improve Google AI

   To help with quality and improve our products (such as the generative machine-learning models that power Gemini Apps), human reviewers (including third parties) read, annotate, and process your Gemini Apps conversations. We take steps to protect your privacy as part of this process. This includes disconnecting your conversations with Gemini Apps from your Google Account before reviewers see or annotate them. Please don’t enter confidential information in your conversations or any data you wouldn’t want a reviewer to see or Google to use to improve our products, services, and machine-learning technologies.

By @mindwok - 12 days

Just adding to the praise: I have a little test case I've used lately which was to identify the cause of a bug in a Dart library I was encountering by providing the LLM with the entire codebase and description of the bug. It's about 360,000 tokens.

I tried it a month ago on all the major frontier models and none of them correctly identified the fix. This is the first model to identify it correctly.

By @jnd0 - 13 days

> with Gemini 2.5, we've achieved a new level of performance by combining a significantly enhanced base model with improved post-training. Going forward, we’re building these thinking capabilities directly into all of our models, so they can handle more complex problems and support even more capable, context-aware agents.

Been playing around with it and it feels intelligent and up to date. Plus is connected to the internet. A reasoning model by default when it needs to.

I hope they enable support for the recently released canvas mode for this model soon it will be a good match.

By @vineyardmike - 13 days

I wonder what about this one gets the +0.5 to the name. IIRC the 2.0 model isn’t particularly old yet. Is it purely marketing, does it represent new model structure, iteratively more training data over the base 2.0, new serving infrastructure, etc?

I’ve always found the use of the *.5 naming kinda silly when it became a thing. When OpenAI released 3.5, they said they already had 4 underway at the time, they were just tweaking 3 be better for ChatGPT. It felt like a scrappy startup name, and now it’s spread across the industry. Anthropic naming their models Sonnet 3, 3.5, 3.5 (new), 3.7 felt like the worst offender of this naming scheme.

I’m a much bigger fan of semver (not skipping to .5 though), date based (“Gemini Pro 2025”), or number + meaningful letter (eg 4o - “Omni”) for model names.

By @jorl17 - 12 days

Just a couple of days ago I wrote on reddit about how long context models are mostly useless to me, because they start making too many mistakes very fast. They are vaguely helpful for "needle in a haystack" problems, not much more.

I have a "test" which consists in sending it a collection of almost 1000 poems, which currently sit at around ~230k tokens, and then asking a bunch of stuff which requires reasoning over them. Sometimes, it's something as simple as "identify key writing periods and their differences" (the poems are ordered chronologically). Previous models don't usually "see" the final poems — they get lost, hallucinate and are pretty much worthless. I have tried several workaround techniques with varying degrees of success (e.g. randomizing the poems).

Having just tried this model (I have spent the last 3 hours probing it), I can say that, to me, this is a breakthrough moment. Truly a leap. This is the first model that can consistently comb through these poems (200k+ tokens) and analyse them as a whole, without significant issues or problems. I have no idea how they did it, but they did it.

The analysis of this poetic corpus has few mistakes and is very, very, very good. Certainly very good in terms of how quickly it produces an answer — it would take someone days or weeks of thorough analysis.

Of course, this isn't about poetry — it's about passing in huge amounts of information, without RAG, and having a high degree of confidence in whatever reasoning tasks this model performs. It is the first time that I feel confident that I could offload the task of "reasoning" over large corpus of data to an LLM. The mistakes it makes are minute, it hasn't hallucinated, and the analysis is, frankly, better than what I would expect of most people.

Breakthrough moment.

By @nickandbro - 13 days

Wow, was able to nail the pelican riding on a bicycle test:

https://www.svgviewer.dev/s/FImn7kAo

By @falcor84 - 13 days

I'm most impressed by the improvement on Aider Polyglot; I wasn't expecting it to get saturated so quickly.

I'll be looking to see whether Google would be able to use this model (or an adapted version) to tackle ARC-AGI 2.

By @ekojs - 13 days

> This will mark the first experimental model with higher rate limits + billing. Excited for this to land and for folks to really put the model through the paces!

From https://x.com/OfficialLoganK/status/1904583353954882046

The low rate-limit really hampered my usage of 2.0 Pro and the like. Interesting to see how this plays out.

By @zone411 - 12 days

Scores 54.1 on the Extended NYT Connections Benchmark, a large improvement over Gemini 2.0 Flash Thinking Experimental 01-21 (23.1).

1 o1-pro (medium reasoning) 82.3

2 o1 (medium reasoning) 70.8

3 o3-mini-high 61.4

4 Gemini 2.5 Pro Exp 03-25 54.1

5 o3-mini (medium reasoning) 53.6

6 DeepSeek R1 38.6

7 GPT-4.5 Preview 34.2

8 Claude 3.7 Sonnet Thinking 16K 33.6

9 Qwen QwQ-32B 16K 31.4

10 o1-mini 27.0

https://github.com/lechmazur/nyt-connections/

By @og_kalu - 13 days

From the 2.0 line, the Gemini models have been far better at Engineering type questions (fluids etc) than GPT, Claude especially with questions that have Images that require more than just grabbing text. This is even better.

By @M4v3R - 13 days

The Long Context benchmark numbers seem super impressive. 91% vs 49% for GPT 4.5 at 128k context length.

By @nikcub - 12 days

Impressive model - but I'm confused by the knowledge cutoff. AI Studio says it is January 2025 (which would be impressive) but querying it for anything early 2025 or mid/late 2024 and it self-reports that it's cutoff is in 2023 (which can't be right).

This is most evident when querying about fast-moving dev tools like uv or bun. It seems to only know the original uv options like pip and tools, while with bun it is unfamiliar with bun outdated (from Aug 2024), bun workspaces (from around that time?) but does know how to install bun on windows (April 2024).

You'll still need to provide this model with a lot of context to use it with any tooling or libraries with breaking changes or new features from the past ~year - which seems to contradict the AI Studio reported knowledge cutoff.

Were I developing models - I'd prioritise squeezing in the most recent knowledge of popular tools and libraries since development is such a popular (and revenue generating) use case.

By @Dowwie - 13 days

This model is a fucking beast. I am so excited about the opportunities this presents.

By @comex - 13 days

I was recently trying to replicate ClaudePlaysPokemon (which uses Claude 3.7) using Gemini 2.0 Flash Thinking, but it was seemingly getting confused and hallucinating significantly more than Claude, making it unviable (although some of that might be caused by my different setup). I wonder if this new model will do better. But I can't easily test it: for now, even paid users are apparently limited to 50 requests per day [1], which is not really enough when every step in the game is a request. Maybe I'll try it anyway, but really I need to wait for them to "introduce pricing in the coming weeks".

Edit: I did try it anyway and so far the new model is having similar hallucinations. I really need to test my code with Claude 3.7 as a control, to see if it approach the real ClaudePlaysPokemon's semi-competence.

Edit 2: Here's the log if anyone is curious. For some reason it's letting me make more requests than the stated rate limit. Note how at 11:27:11 it hallucinates on-screen text, and earlier it thinks some random offscreen tile is the stairs. Yes, I'm sure this is the right model: gemini-2.5-pro-exp-03-25.

https://a.qoid.us/20250325/

[1] https://ai.google.dev/gemini-api/docs/rate-limits#tier-1

By @ascorbic - 12 days

It can answer my favourite riddle for LLMs:

"Anna, Becca and Clare go to the play park. There is nobody else there. Anna is playing on the see-saw, Becca is playing on the swings. What is Clare doing?" (Sometimes I ask similar questions with the same structure and assumptions but different activities)

About a year ago none of them could answer it. All the latest models can pass it if I tell them to think hard, but previously Gemini could rarely answer it without that extra hint. Gemini 2.5 caveats its answer a bit, but does get it correct. Interestingly GPT-4o initially suggests it will give a wrong answer without thinking, but recognises it's a riddle, so decides to think harder and gets it right.

By @andai - 13 days

How does Gemini have such a big context window?

I thought memory requirement grows exponentially with context size?

By @batata_frita - 12 days

Why do I have the feel that nobody is too much excited to google's models compared to other companies?

By @arjun_krishna1 - 12 days

I've been using Gemini Pro for my University of Waterloo capstone engineering project. Really good understanding of PDF documents and good reasoning as well as structured output Recommend trying it out at aistudio dot google dot com

By @d3nj4l - 12 days

A model that is better on Aider than Sonnet 3.7? For free, right now? I think I'll give it a spin this weekend on a couple of projects, seems too good to be true.

By @summerlight - 12 days

This looks like the first model where Google seriously comes back into the frontier competition? 2.0 flash was nice for the price but it's more focused on efficiency, not the performance.

By @serjester - 13 days

I wish they’d mention pricing - it’s hard to seriously benchmark models when you have no idea what putting it in production would actually cost.

By @Davidzheng - 12 days

On initial thoughts, I think this might be the first AI model to be reliably helpful as a research assistant in pure mathematics (o3-mini-high can be helpful but is more prone to hallucinations)

By @strstr - 12 days

It's a lot better at my standard benchmark "Magic: The Gathering" rules puzzles. Gets the answers right (both the outcome and rationale).

By @asah - 12 days

It nailed my two hard reasoning+linguistic+math questions in one shot, both the kinds of things that LLM struggle but humans do well.

(DM me for the questions)

By @ofermend - 12 days

This model is quite impressive. Not just useful for math/research with great reasoning, it also maintained a very low hallucination rate of 1.1% on Vectara Hallucination Leaderboard: https://github.com/vectara/hallucination-leaderboard

By @jasonpeacock - 13 days

Isn't every new AI model the "most <adjective>"?

Nobody is going to say "Announcing Foobar 7.1 - not our best!"

By @f1shy - 12 days

One test I always do is ask for an absolutely minimal language interpreter with TCO.

This is part of the code output (after several interactions of it not returning actual code):

        // Tail Call Optimization (very basic)
        if(func->type == VAL_FUNCTION){
            return apply(func, args, env); //no stack growth.
        }
        else{
            return apply(func, args, env);
        }

I'm not very impressed.

I pointed out that part of the code, and answered:

You've correctly pointed out that the TCO implementation in the provided C code snippet is essentially a no-op. The if and else blocks do the same thing: they both call apply(func, args, env). This means there's no actual tail call optimization happening; it's just a regular function call.

But then follows with even worst code. It does not even compile!

By @lvl155 - 12 days

With recent pace of model updates, I wonder which factor is more important: hardware assets, software/talent, or data access. Google clearly is in the lead in terms of data access in my view. If I am a top talent in AI, I’d go where I can work with the best data no?

By @Medicineguy - 12 days

While I'm sure the new Gemini model has made improvements, I feel like the user experience outside of the model itself is stagnating. I think OpenAI's interfaces, both web app and mobile app, are quite a bit more polished currently. For example, Gemini's speech recognition struggles with longer pauses and often enough cuts me off mid-sentence. Also, OpenAIs whisper model understands more context (for instance, saying “[...] plex, emby and Jellyfin [...]” is usually understood in whisper, but less often in Gemini) The Gemini web app lacks keyboard shortcuts for basic actions like opening a new chat or toggling the sidebar (good for privacy friendly pair programming). Last point off the top of my head would be the ability to edit messages beyond just the last one. That's possible in ChatGPT, but not in Gemini. Googlers are spending so much money for model training, I would appreciate spending some for making it fun to use :)

By @barrenko - 13 days

The incumbent has awoken.

By @cj - 13 days

Slight tangent: Interesting that they use o3-mini as the comparison rather than o1.

I've been using o1 almost exclusively for the past couple months and have been impressed to the point where I don't feel the need to "upgrade" for a better model.

Are there benchmarks showing o3-mini performing better than o1?

By @jszymborski - 12 days

Gemini refuses to answer any questions on poprtional swing models or anything related to psephology on the grounds that it has to do with elections. Neither Claude nor ChatGPT nor Mistral/Le Chat are that neutered.

By @WasimBhai - 12 days

I do not intend to take anything away from the technical achievement of the team. However, as Satya opined some weeks back, these benchmarks do not mean a lot if we do not see a comparable increase in productivity.

But then there are two questions. First, are the white collar workers specifically consultants, engineers responsible for increase in productivity? Or is the white collar workers at the very right tail e.g., scientists?

I think consultants and engineers are using these technologies a lot. I think biologists at least are using these models a lot.

But then where is the productivity increases?

By @simonw - 12 days

Here's a Gemini 2.5 provided summary of this Hacker News thread as of the moment when it had 269 comments: https://gist.github.com/simonw/3efa62d917370c5038b7acc24b7c7...

I ran this command to create it:

  curl -s "https://hn.algolia.com/api/v1/items/43473489" | \
    jq -r 'recurse(.children[]) | .author + ": " + .text' | \
    llm -m "gemini-2.5-pro-exp-03-25" -s \
    'Summarize the themes of the opinions expressed here.
    For each theme, output a markdown header.
    Include direct "quotations" (with author attribution) where appropriate.
    You MUST quote directly from users when crediting them, with double quotes.
    Fix HTML entities. Output markdown. Go long. Include a section of quotes that illustrate opinions uncommon in the rest of the piece'

Using this script: https://til.simonwillison.net/llms/claude-hacker-news-themes

By @jharohit - 13 days

why not enable Canvas for this model on Gemini.google.com? Arguably the weakest link of Canvas is the terrible code that Gemini 2.0 Flash writes for Canvas to run..

By @TheMagicHorsey - 12 days

I tested out Gemini 2.5 and it failed miserably at calling into tools that we had defined for it. Also, it got into an infinite loop a number of times where it would just spit out the exact same line of text continuously until we hard killed the process. I really don't know how others are getting these amazing results. We had no problems using Claude or OpenAI models in the same scenario. Even Deepseek R1 works just fine.

By @rodolphoarruda - 12 days

I've been trying to use Gemini 2.0 Flash, but I don't think it's possible. The model still thinks it's running the 1.5 Pro model.

Reference: https://rodolphoarruda.pro.br/wp-content/uploads/image-14.pn...

By @joshdavham - 12 days

When these companies release a model “2.5”, are they using some form of semver? Where are these numbers coming from?

By @daquisu - 13 days

Weird, they released Gemini 2.5 but I still can't use 2.0 pro with a reasonable rate limit (5 RPM currently).

By @andai - 13 days

Can anyone share what they're doing with reasoning models? They seem to only make a difference with novel programming problems, like Advent of Code. So this model will help solve slightly harder advent of codes.

By extension it should also be slightly more helpful for research, R&D?

By @xnx - 13 days

It will be huge achievement if models can get to the point where so much selection effort isn't required: gemini.google.com currently lists 2.0 Flash, 2.0 Flash Thinking (experimental), Deep Research, Personalization (experimental), and 2.5 Pro (experimental) for me.

By @DaveMcMartin - 12 days

I love to see this competition between companies trying to get the best LLM, and also, the fact that they’re trying to make them useful as tools, focusing on math, science, coding, and so on

By @afro88 - 13 days

Is this the first model announcement where they show Aider's Polyglot benchmark in the performance comparison table? That's huge for Aider and anotherpaulg!

By @andai - 12 days

I asked it for suggestions for a project, and it was the only model that correctly pointed out serious flaws in the existing proposal. So far so good!

By @t_minus_40 - 12 days

i have asked the direction of friction on ball rolling either up or down on an inclined plan - it gave wrong answer and was adamant about it. Surprisingly, similar to o1.

gave a problem which sounds like monty hall problem but a simple probability question and it nailed it.

asked to tell a joke - horrible joke ever.

much better than o1 but still no where near agi. it has been optimized for logic and reasoning at best.

By @theragra - 11 days

Yeah, and then it says that call of duty is pronounced call of dah-tee when I speak in Russian.

Chatgpt pronounced correctly

By @mclau156 - 12 days

Generated 1000 lines of turn based combat with shop, skills, stats, elements, enemy types, etc. with this one

By @slama - 12 days

Interestingly, the model hallucinated the ability to use a search tool when I was playing around with it

By @dcchambers - 12 days

> Developers and enterprises can start experimenting with Gemini 2.5 Pro in Google AI Studio now, and Gemini Advanced users can select it in the model dropdown on desktop and mobile. It will be available on Vertex AI in the coming weeks.

I'm a Gemini Advanced subscriber, still don't have this in the drop-down model selection in the phone app, though I do see it on the desktop webapp.

By @billforsternz - 12 days

I know next to nothing about AI, but I just experienced an extraordinary hallucination in a google AI search (presumably an older Gemini model right?) as I elaborated in detail in another HN thread. It might be a good test question. https://news.ycombinator.com/item?id=43477710

By @testycool - 12 days

It feels like Gemini 2.0 Pro + Reasoning.

I also see Gemini 2.0 Pro has been replaced completely in AI Studio.

By @honeybadger1 - 13 days

Claude is still the king right now for me. Grok is 2nd in line, but sometimes it's better.

By @Alifatisk - 12 days

Can't wait for the benchmark at artificialanalysis.ai

By @vivzkestrel - 13 days

hi, here is our new AI model, it performs task A x% better than our competitor 1, task B y% better than our competitor 2 seems to be the new hot AI template in town

By @eenchev - 13 days

"My info, the stuff I was trained on, cuts off around early 2023." - Gemini 2.5 to me. Appears that they did a not-so-recent knowledge cutoff in order to use the best possible base model.

By @joelthelion - 13 days

Is this model going to be restricted to paying users?

By @marcus_holmes - 12 days

I tried the beta version of this model to write a business plan (long story).

I was impressed at first. Then it got really hung up on the financial model, and I had to forcibly move it on. After that it wrote a whole section in Indonesian, which I don't speak, and then it crashed. I'd not saved for a while (ever since the financial model thing), and ended up with an outline and a couple of usable sections.

I mean, yes, this is better than nothing. It's impressive that we made a pile of sand do this. And I'm aware that my prompt engineering could improve a lot. But also, this isn't a usable tool yet.

I'm curious to try again, but wary of spending too much time "playing" here.

By @pachico - 12 days

It really surprises me that Google and Amazon, considering their infrastructure and the urge to excel at this, aren't leading the industry.

By @andrewinardeer - 13 days

Google is overly cautious with their guardrails.

Granted, Gemini answers it now, however, this one left me shaking my head.

https://cdn.horizon.pics/PzkqfxGLqU.jpg

By @fourseventy - 13 days

Does it think the founding fathers were a diverse group of mixed races and genders like the last model did?

By @noisy_boy - 13 days

Is Gemini and Bard same? I asked it a question and it said "... areas where I, as Bard, have..."

By @skinkestek - 12 days

Can it now generate images of soldier in typical uniforms from 1940s Germany without having to throw in a few token ethnicities?

Or generate images of the founding fathers of US that at least to some degree resemble the actual ones?

By @meerab - 6 days

The Gemini 2.5 model is truly impressive, especially with its multimodal capability. Its ability to understand audio and video content is amazing—truly groundbreaking.

I spent some time experimenting with Gemini 2.5, and its reasoning abilities blew me away. Here are few standout use cases that showcase its potential:

1. Counting Occurrences in a Video

In one experiment, I tested Gemini 2.5 with a video of an assassination attempt on then-candidate Donald Trump. Could the model accurately count the number of shots fired? This task might sound trivial, but earlier AI models often struggled with simple counting tasks (like identifying the number of "R"s in the word "strawberry").

Gemini 2.5 nailed it! It correctly identified each sound, outputted the timestamps where they appeared, and counted eight shots, providing both visual and audio analysis to back up its answer. This demonstrates not only its ability to process multimodal inputs but also its capacity for precise reasoning—a major leap forward for AI systems.

2. Identifying Background Music and Movie Name

Have you ever heard a song playing in the background of a video and wished you could identify it? Gemini 2.5 can do just that! Acting like an advanced version of Shazam, it analyzes audio tracks embedded in videos and identifies background music. I am also not a big fan of people posting shorts without specifying the movie name. Gemini 2.5 solves that problem for you - no more searching for movie name!

3. OCR Text Recognition

Gemini 2.5 excels at Optical Character Recognition (OCR), making it capable of extracting text from images or videos with precision. I asked the model to output one of Khan Academy's handwritten visuals into a nice table format - and the text was precisely copied from video into a neat little table!

4. Listen to Foreign News Media

The model can translate text from one language to another and give a good translation. I tested the recent official statement from Thai officials about an earthquake in Bangkok, and the latest news from a Marathi news channel. The model was correctly able to translate and output the news synopsis in the language of your choice.

5. Cricket Fans?

Sports fans and analysts alike will appreciate this use case! I tested Gemini 2.5 on an ICC T20 World Cup cricket match video to see how well it could analyze gameplay data. The results were incredible: the model accurately calculated scores, identified the number of fours and sixes, and even pinpointed key moments—all while providing timestamps for each event.

7. Webinar - Generate Slides from Video

Now this blew my mind - video webinars are generated by slide decks and a person talking about the slides. Can we reverse the process? Given a video, can we ask AI to output the slide deck? Google Gemini 2.5 outputted 41 slides for a Stanford webinar!

Bonus: Humor Test

Finally, I put Gemini 2.5 through a humor test using a PG-13 joke from one of my favorite YouTube channels, Mike and Joelle. I wanted to see if the model could understand adult humor and infer punchlines.

At first, the model hesitated to spell out the punchline (perhaps trying to stay appropriate?), but eventually, it got there—and yes, it understood the joke perfectly!

https://videotobe.com/blog/googles-gemini-25

By @cp9 - 12 days

does it still suggest glue on pizza

By @resource_waste - 13 days

I'll try it tonight, but I'm not excited, its just work.

ChatGPT4.5, I was excited.

Deepseek, I was excited. (then later disappointed)

I know Gemini probably wont answer any medical question, even if you are a doctor. ChatGPT will.

I know I've been disappointed at the quality of Google's AI products. They are backup at best.

By @noisy_boy - 13 days

Are Gemini and Bard same? I asked it a question and it said "... areas where I, as Bard, have...."

By @ototot - 13 days

And OpenAI is announcing their ImageGen in 4o

https://news.ycombinator.com/item?id=43474112

By @throwaway13337 - 13 days

Google has this habit of 'releasing' without releasing AI models. This looks to be the same?

I don't see it on the API price list:

https://ai.google.dev/gemini-api/docs/pricing

I can imagine that it's not so interesting to most of us until we can try it with cursor.

I look forward to doing so when it's out. That Aider bench mixed with the speed and a long context window that their other models are known for could be a great mix. But we'll have to wait and see.

More generally, it woud be nice for these kinds of releases to also add speed and context window as a separate benchmark. Or somehow include it in the score. A model that is 90% as good as the best but 10x faster is quite a bit more useful.

These might be hard to mix to an overall score but they're critical for understanding usefulness.

Gemini's data-analyzing abilities aren't as good as Google claims

Gemini Pro 1.5 experimental "version 0801" available for early testing

Google Gemini 1.5 Pro leaps ahead in AI race, challenging GPT-4o

Google has launched Gemini 1.5 Pro, an advanced AI model excelling in multilingual tasks and coding, now available for testing. It raises concerns about AI safety and ethical use.

Gemini 2.0: our new AI model for the agentic era

Google has launched Gemini 2.0, an advanced AI model with multimodal capabilities, including image and audio output. Wider access is planned for early 2025, focusing on responsible AI development.

Gemini 2.0 is now available to everyone

Google has launched Gemini 2.0, featuring the Flash model for all users, a Pro model for coding, and a cost-efficient Flash-Lite model, all with enhanced safety measures and ongoing updates.

Gemini 2.5: Our most intelligent AI model

Related

Gemini's data-analyzing abilities aren't as good as Google claims

Gemini Pro 1.5 experimental "version 0801" available for early testing

Google Gemini 1.5 Pro leaps ahead in AI race, challenging GPT-4o

Gemini 2.0: our new AI model for the agentic era

Gemini 2.0 is now available to everyone

Related

Gemini's data-analyzing abilities aren't as good as Google claims

Gemini Pro 1.5 experimental "version 0801" available for early testing

Google Gemini 1.5 Pro leaps ahead in AI race, challenging GPT-4o

Gemini 2.0: our new AI model for the agentic era

Gemini 2.0 is now available to everyone