Google unveils AI models that will power humanoid robots

Google’s journey into artificial intelligence has taken a significant leap forward with the introduction of Gemini Robotics and Gemini Robotics ER, expanding on the foundation laid by the original Gemini AI model launched in 2023. These new advancements mark Google’s ambition to push AI beyond the realm of digital tasks — like generating text and images — into physical actions, empowering robots with a level of human-like understanding, dexterity, and adaptability never seen before. This represents a fusion of vision, language, and action (VLA) systems, designed to enable robots to comprehend complex instructions, analyze their environment, and carry out physical tasks with a level of precision and fluidity that mimics human behavior.

Google’s core philosophy behind this innovation revolves around "embodied reasoning" — the concept that AI must go beyond passive observation and actively engage with the physical world. The company explains that for AI to be truly helpful in everyday environments, it must learn to interpret the world in real time, predict outcomes, and take action — all while prioritizing safety and efficiency. Essentially, robots powered by Gemini Robotics models won’t just "see and do" — they’ll "see, think, adapt, and do."

Built on the foundation of Gemini 2.0, these new models introduce a paradigm shift by treating physical actions as an output — similar to how traditional AI generates text or images. This enables robots to move and manipulate objects intelligently based on contextual understanding, not just pre-programmed routines. For example, if asked to clean a table, the robot can evaluate the objects on it, determine what to move, clean around fragile items, and even recognize a tipped-over drink that needs cleaning up before continuing the initial task — all without needing separate instructions for each step.

Key pillars: Generality, Interactivity, Dexterity

Google DeepMind highlights three essential pillars that make Gemini Robotics groundbreaking:

Generality: The model isn’t locked into specific tasks. It can adapt to new, unfamiliar situations without requiring programmers to write fresh code each time. This allows the robot to handle diverse tasks — from making coffee to organizing shelves — with the same underlying AI.
Interactivity: The model supports natural language interactions. This means users can give instructions in everyday language, and the robot not only understands but adapts its approach based on tone, context, or follow-up instructions. If you say, "Clean up, but be careful with the vase," the robot interprets that caution and adjusts its grip and movement speed accordingly.
Dexterity: The AI introduces a breakthrough in robotic movement precision. It’s designed to handle delicate tasks like folding paper, unscrewing a bottle cap, or picking up a single piece of fruit without crushing it — tasks that typically require extreme precision and sensitivity.

Steerability: A new level of control

One of the standout features of Gemini Robotics is its "steerability" — the ability for users to directly guide and adjust the robot’s behavior through natural language instructions. If the robot is arranging items on a table but you decide you want the plates stacked differently or the cups moved to the other side, you can simply give a verbal command and watch the robot adapt in real time. This makes collaborating with robot assistants feel more intuitive and human-like than ever before.

Adaptable across hardware

Google designed Gemini Robotics with versatility in mind. Although trained primarily on the ALOHA 2 bi-arm robotic platform, the model also demonstrated compatibility with Franka Emika robotic arms — a popular setup in academic and industrial research labs. This adaptability means Gemini Robotics can work with various robotic platforms, making it accessible for a wide range of developers and industries without major hardware changes.

Introducing Gemini Robotics ER

Taking things even further, Google introduced Gemini Robotics ER — an enhanced version designed to push spatial reasoning and real-world problem-solving to new heights. This model advances the robot’s ability to understand 3D environments, recognize objects from different angles, and even generate new code on the fly to solve tasks it hasn’t encountered before.

For instance, if shown a coffee mug, Gemini Robotics ER can instinctively figure out the best way to grip it — positioning two fingers on the handle — and plan a safe, collision-free path to pick it up, even in a cluttered kitchen. It’s also capable of navigating dynamic environments, like crowded rooms or moving workplaces, while avoiding obstacles and recalculating routes in real-time.

The model handles perception, state estimation, spatial awareness, planning, and code generation all within a single system, eliminating the need for external processors or controllers. This streamlined approach reduces complexity for developers, making it faster and easier to integrate into commercial and research environments alike.

A glimpse into the future

With Gemini Robotics and Gemini Robotics ER, Google isn’t just redefining what robots can do — it’s reimagining the very nature of human-robot collaboration. From assisting people with disabilities and supporting household tasks to performing precision-based industrial work and exploring dangerous environments like disaster zones, the potential applications are vast and transformative.

By combining embodied reasoning, natural language understanding, advanced spatial awareness, and real-time adaptability, Gemini Robotics could lead us into a future where robots become true partners in both everyday life and specialized industries. Whether it’s helping a senior citizen prepare a meal, collaborating with a surgeon in an operating room, or automating complex manufacturing processes — Google’s latest innovation is paving the way for a new era of intelligent, helpful, and versatile robotics.

TheSwipeUp

Google unveils AI models that will power humanoid robots