Gemini Robotics brings AI into the physical world
Google DeepMind's Gemini Robotics introduces an AI model enhancing robotic capabilities through advanced vision-language-action integration, focusing on safety, interactivity, and dexterity for practical applications in everyday life.
Read original articleGoogle DeepMind has introduced Gemini Robotics, a new AI model based on Gemini 2.0, designed to enhance robotics by integrating advanced vision-language-action capabilities. This model aims to enable robots to perform complex tasks in the physical world through embodied reasoning, allowing them to understand and react to their environment similarly to humans. Gemini Robotics features two key components: the main Gemini Robotics model, which controls robots using physical actions, and Gemini Robotics-ER, which focuses on spatial understanding and embodied reasoning. These models significantly improve generality, interactivity, and dexterity, allowing robots to adapt to new situations, understand natural language commands, and perform intricate tasks requiring fine motor skills. The technology is being developed in partnership with Apptronik to create humanoid robots and is being tested by select organizations. Additionally, safety measures are integrated into the models to ensure safe interactions with humans and environments. The initiative also includes the release of a new dataset aimed at improving the safety of embodied AI in robotics. Overall, Gemini Robotics represents a significant advancement in the capabilities of robots, moving closer to practical applications in everyday life.
- Gemini Robotics integrates advanced AI capabilities for real-world robotic applications.
- The model enhances generality, interactivity, and dexterity in robotic tasks.
- Safety measures are prioritized to ensure safe human-robot interactions.
- Collaboration with Apptronik aims to develop next-generation humanoid robots.
- A new dataset is released to improve safety in embodied AI and robotics.
Related
Gemini Pro 1.5 experimental "version 0801" available for early testing
Google DeepMind's Gemini family of AI models, particularly Gemini 1.5 Pro, excels in multimodal understanding and complex tasks, featuring a two million token context window and improved performance in various benchmarks.
Google Gemini 1.5 Pro leaps ahead in AI race, challenging GPT-4o
Google has launched Gemini 1.5 Pro, an advanced AI model excelling in multilingual tasks and coding, now available for testing. It raises concerns about AI safety and ethical use.
Gemini 2.0: our new AI model for the agentic era
Google has launched Gemini 2.0, an advanced AI model with multimodal capabilities, including image and audio output. Wider access is planned for early 2025, focusing on responsible AI development.
Google releases its own 'reasoning' AI model
Google has launched the Gemini 2.0 Flash Thinking Experimental AI model for reasoning tasks, which is still in development and shows inconsistencies, raising concerns about computational costs and long-term performance.
Gemini 2.0 is now available to everyone
Google has launched Gemini 2.0, featuring the Flash model for all users, a Pro model for coding, and a cost-efficient Flash-Lite model, all with enhanced safety measures and ongoing updates.
- Many users express curiosity about the practical applications of the technology, particularly in areas like recycling and household tasks.
- There is a notable skepticism regarding the authenticity and real-world applicability of the demo videos, with some commenters recalling past instances of misleading demonstrations.
- Concerns about the ethical implications of robotics and AI, including safety and job displacement, are frequently mentioned.
- Several commenters highlight the need for open-source alternatives and accessible robotics for independent developers.
- Overall, there is a call for tangible products rather than promotional content, with users wanting to see real-world applications of the technology.
Turns out he was just writing LLM prompts way ahead of his time.
(Also there seems to be some kind of video auto-play/pause/scroll thing going on with the page? Whatever it is, it's broken.)
https://techcrunch.com/2023/12/07/googles-best-gemini-demo-w...
Time to think bigger.
Can these spatial reasoning and end-effector tasks be reliably repeated or are we just looking at the robotic equivalent of "trick-shots" where the success percentile is in the single digits?
I'd say Okura and Vinci are the current leaders in multi-axis multi-arm end-effectors and they have nothing like this.
Well, for now, at least.
I know who will be the first shown the door when the next round of layoffs comes: the guy saying "you can't make money that way."
When I see open roles at these companies I think the projects I'm going to work on in the future will be more and more irrelevant to society as a whole.
Anyway, this is amazing. Please delete/remove my post if it seems like this adds nothing to the conversation
This is a great achievement and I'm not underestimating the work of the people involved. But similar videos have been put together by many research labs and startups for years now.
I feel like Google's a bit lost. And Sundai's leadership has not been good for this, if we're honest.
GOOG is around the same price as it was in 2022, which means the AI wave went by through them with zero effect. With other tech companies doubling/tripling their market cap during this time, Sundai really left 1 trillion of unrealized value on the table (!); also consider Google had all the cards at one point, quite mediocre imo.
These models are nowhere near that for now, but I'll be watching to see if the big investments into synthetic data generation over the next few years get them closer.
I have not read it yet (it's being released in May), so can't give a true recommendation, but I did preorder it. The state of art is changing fast, and the book is expected to capture at least some snapshot of it.
Being able to order food and handle bureaucracy in these languages while speaking only English would be amazing. This seems like a simpler problem than tackling robots in 3D space, yet it’s still unsolved.
I expect they are more honest than the Telsa men in suits debacle, but my trust is low.
What do we know to be the facts?
Perhaps it goes beyond the brightest minds at Google that people can grasp things with their eyes closed. That we don't need to see to grasp. But designing good robots with tactile sensors is too much for our top researchers.
Also, I vaguely remember similar demos without the AI hype. Maybe it was from DeepMind, or another upstart back in 2015.
Fuck it, make the arms big enough and it can do laundry, load/unload dishwasher, clean up after cooking/eating.
I can finally see this happening. Probably Tesla first tho.
"We consulted with ourselves and decided that we're a-okay!"
I would love to experiment with something like this but everytime I try to figure out what hardware to do it with there's a thousand cheap no-name options and then bam 30k+ for the pro ones.
Uhhh, I mean that's nice, but how about: "That's why we will never sell our products to military, police, or other openly violent groups, and will ensure that our robots will always respond instantly to commands like, 'stop, you're hurting me', which they understand in every documented human language on earth, and with which they will comply regardless of who gave the previous command that caused violent behavior."
Who is building the robot cohort that is immune - down to the firmware level - to state coercion and military industry influence?
Now, clean up the kitchen.
1.) Has cutting edge in house AI models (Like OpenAI, Anthropic, Grok, etc.)
2.) Has cutting edge in house AI hardware acceleration (Like Nvidia)
3.) Has (likely) cutting edge robotics (Like Boston Dynamics, Tesla, Figure)
4.) Has industry leading self driving taxis (Like Tesla wants)
5.) Has all the other stuff that Google does. (Like insert most tech companies)
The big thing that Google lacks is excitement and hype (Look at the comments for all their development showcases). They've lost their veneer, for totally understandable reasons, but that veneer is just dusty, the fundamentals of it are still top notch. They are still poised to dominate in what the current forecasted future looks like. The things that are tripping Google up are relatively easy fixes compared to something like a true tech disadvantage.
I'm not trying to shill despite how shill like this post objectively is. It's just an observation that Google has all the right players and really just needs better coaching. Something that isn't too difficult fix, and something shareholders will get eventually.
If scifi authors aren't keeping up it's hard to expect the rest of us to. But the macro and micro economic changes implied by this technology are huge. Very little of our daily lives will be undisrupted when it propagates and saturates the culture, even with no further fundamental advances.
Can anyone recommend scifi that makes plausible projections around this tech?
It seems unlikely that any company (Google included) will have a robotics moat.
If we see a real world application that a business actually uses, or that people want to use, that's great. But why announce the prototype with the lab demos? It's premature. Better to wait until you have a good real life working use case to brag about.
Had they focused more on driving innovation and not profit/being relevant, they could have had another win instead of another Google+ Instead, we got African-German Nazi's.
Google was already an advertising monopoly by the time this happened and his job is to sell ads and minimize costs...the rest of Google is just there for marketing & public relations
Related
Gemini Pro 1.5 experimental "version 0801" available for early testing
Google DeepMind's Gemini family of AI models, particularly Gemini 1.5 Pro, excels in multimodal understanding and complex tasks, featuring a two million token context window and improved performance in various benchmarks.
Google Gemini 1.5 Pro leaps ahead in AI race, challenging GPT-4o
Google has launched Gemini 1.5 Pro, an advanced AI model excelling in multilingual tasks and coding, now available for testing. It raises concerns about AI safety and ethical use.
Gemini 2.0: our new AI model for the agentic era
Google has launched Gemini 2.0, an advanced AI model with multimodal capabilities, including image and audio output. Wider access is planned for early 2025, focusing on responsible AI development.
Google releases its own 'reasoning' AI model
Google has launched the Gemini 2.0 Flash Thinking Experimental AI model for reasoning tasks, which is still in development and shows inconsistencies, raising concerns about computational costs and long-term performance.
Gemini 2.0 is now available to everyone
Google has launched Gemini 2.0, featuring the Flash model for all users, a Pro model for coding, and a cost-efficient Flash-Lite model, all with enhanced safety measures and ongoing updates.