April 2nd, 2025

How Google built its Gemini robotics models

Google DeepMind has launched Gemini Robotics models that enhance robots' capabilities to learn complex tasks, focusing on embodied reasoning for effective interaction, promoting adaptability for future applications in various industries.

Read original article

How Google built its Gemini robotics models

Google DeepMind has introduced a new family of Gemini Robotics models designed to enhance the capabilities of robots. These models enable robots to learn and perform complex tasks, such as preparing salads, playing games like Tic-Tac-Toe, and folding origami. The head of robotics, Carolina Parada, highlighted a significant moment when a bi-arm ALOHA robot successfully executed a "slam dunk" with a toy basketball, showcasing the model's ability to understand and perform actions it had never encountered before. The Gemini Robotics models are multimodal, integrating physical actions with outputs like text and audio, allowing robots to adapt to new objects and environments without additional training. The Gemini Robotics-ER model focuses on embodied reasoning, enabling robots to recognize and interact with their surroundings effectively. This approach contrasts with traditional methods that train robots for single tasks, as the Gemini models are trained on a wide range of tasks to promote generalization. The adaptability of these models is crucial for future applications in various industries, including complex environments and human-centric spaces. Google aims to develop robots that can assist with everyday tasks, moving closer to a future where robots are integral to daily life.

- Google DeepMind has launched Gemini Robotics models for advanced robotic capabilities.

- The models enable robots to learn and perform complex tasks without prior exposure.

- Gemini Robotics-ER focuses on embodied reasoning for effective interaction with environments.

- The training approach emphasizes broad task learning over single-task training.

- Future applications include assisting in complex industries and human-centric environments.

Gemini Pro 1.5 experimental "version 0801" available for early testing

Google DeepMind's Gemini family of AI models, particularly Gemini 1.5 Pro, excels in multimodal understanding and complex tasks, featuring a two million token context window and improved performance in various benchmarks.

Google Gemini 1.5 Pro leaps ahead in AI race, challenging GPT-4o

Google has launched Gemini 1.5 Pro, an advanced AI model excelling in multilingual tasks and coding, now available for testing. It raises concerns about AI safety and ethical use.

Gemini 2.0: our new AI model for the agentic era

Google has launched Gemini 2.0, an advanced AI model with multimodal capabilities, including image and audio output. Wider access is planned for early 2025, focusing on responsible AI development.

Gemini Robotics brings AI into the physical world

Google DeepMind's Gemini Robotics introduces an AI model enhancing robotic capabilities through advanced vision-language-action integration, focusing on safety, interactivity, and dexterity for practical applications in everyday life.

Gemini 2.5: Our most intelligent AI model

Google has launched Gemini 2.5, its most advanced AI model, excelling in reasoning and coding, with a 1 million token context window, available for developers in Google AI Studio.

9 comments

By @lima - 4 days

They can do that, yet somehow, Gemini Assistant on Pixel phones still fails to reliably set timers or add shopping list items :-)

(which worked fine with Google Assistant)

By @dachworker - 4 days

The "how" is completely missing, but if they can get this to work semi reliably it will be ChatGPT x100 in terms of impact.

By @harmmonica - 4 days

Even if Google's robotics technology (software and hardware) is leading edge does anyone think they'll actually be able to productize it? Seems similar to how they were the pre-product leaders in transformers and then fumbled any advantage they had to ChatGPT. It seems like something's missing from Google where they can't get from research to product effectively. Waymo perhaps a good counterexample if you think where they are today is product/market fit, but I can't shake the feeling that Google more often than not can't seem to get things to market or even if they do they give up on them before they take hold.

Just wondering if anyone has a strong feeling or, better yet, insight on this regarding their robotics efforts.

By @abidhusain - 3 days

The advancements in AI and robotics are incredibly exciting! With complex systems like Gemini, companies will need to rely on specialized teams to bring these innovations to life.

Outsourcing specific roles such as AI research or robotics engineers can help companies bring top-tier talent into the fold without the burden of full-time recruitment. It's fascinating to see how outsourcing can complement R&D in cutting-edge industries like robotics.

Curious to see how this shifts the industry, especially in terms of scalability and speed to market

By @otherayden - 3 days

It's terrifying to think that robots like this will probably be used in the defense industry at some point. If the robot understands something as general as "put the erasers away", imagine "kill all enemies".

By @hansmayer - 3 days

"Pick up the basketball and slam-dunk it". The killer use-case we've been waiting on for so long :)

By @barbazoo - 3 days

> Sounds like someone will get some help with those chores — eventually.

Aaaaw that's nice. Except it's all military under the hood but nice that they try to make us think they'll fold our laundry instead.

By @cozyman - 4 days

just curious, what would it do if you asked it to kill someone? does it follow the laws of robotics?

By @free652 - 3 days

April 1st!

Gemini Pro 1.5 experimental "version 0801" available for early testing

Google Gemini 1.5 Pro leaps ahead in AI race, challenging GPT-4o

Google has launched Gemini 1.5 Pro, an advanced AI model excelling in multilingual tasks and coding, now available for testing. It raises concerns about AI safety and ethical use.

Gemini 2.0: our new AI model for the agentic era

Google has launched Gemini 2.0, an advanced AI model with multimodal capabilities, including image and audio output. Wider access is planned for early 2025, focusing on responsible AI development.

Gemini Robotics brings AI into the physical world

Gemini 2.5: Our most intelligent AI model

Google has launched Gemini 2.5, its most advanced AI model, excelling in reasoning and coding, with a 1 million token context window, available for developers in Google AI Studio.

How Google built its Gemini robotics models

Related

Gemini Pro 1.5 experimental "version 0801" available for early testing

Google Gemini 1.5 Pro leaps ahead in AI race, challenging GPT-4o

Gemini 2.0: our new AI model for the agentic era

Gemini Robotics brings AI into the physical world

Gemini 2.5: Our most intelligent AI model

Related

Gemini Pro 1.5 experimental "version 0801" available for early testing

Google Gemini 1.5 Pro leaps ahead in AI race, challenging GPT-4o

Gemini 2.0: our new AI model for the agentic era

Gemini Robotics brings AI into the physical world

Gemini 2.5: Our most intelligent AI model