Google Launches New AI Models to Improve Robot Perception, Reasoning, and Interaction

Advertisements

Google has launched two new artificial intelligence (AI) models designed to improve how robots perceive, interact with, and operate in the physical world.

The models, Gemini Robotics and Gemini Robotics-ER, are based on Google’s Gemini 2.0 AI framework and are expected to enhance robots’ ability to perform tasks in industrial, commercial, and even domestic environments.

Gemini Robotics is a vision-language-action (VLA) model that allows robots to understand commands, interpret visual data, and execute physical actions.

This means robots built with this model will be able to respond to spoken instructions, recognise objects, and interact with their surroundings more intuitively.

The second model, Gemini Robotics-ER, takes this further by incorporating spatial reasoning abilities, enabling robots to understand complex environments and adjust their movements accordingly.

According to Google, these models have been designed to work with robots of various shapes and sizes, from humanoid robots to industrial machines commonly used in factories and warehouses.

The company has already tested Gemini Robotics on its ALOHA 2 bi-arm robotic platform, demonstrating its ability to handle tasks requiring precision and adaptability.

The model has also been successfully applied to Apptronik’s Apollo humanoid robot, which is being developed for real-world applications.

Google’s focus on robotics follows the recent decision by Figure AI to end its collaboration with OpenAI after making its own innovations in AI-powered robots. With the introduction of Gemini Robotics and Gemini Robotics-ER, Google is taking hold of the robotics industry.

One of the advantages of these models is their ability to reduce development costs for robotics startups. Companies can speed up the time it takes to bring functional robots to market when they leverage Google’s AI-powered frameworks.

The tech giant has also stressed the flexibility of its models, stating that developers can customise them for specific robotic applications using Gemini’s advanced reasoning capabilities.

Google has partnered with Apptronik to integrate its AI models into humanoid robots, aiming to create machines capable of performing tasks that require both cognitive reasoning and physical dexterity.

In August, Apptronik raised $350 million in a funding round led by B Capital and Capital Factory, with Google also participating to support the development of next-generation humanoid robots.

This is not Google’s first move in robotics. The tech giant previously acquired Boston Dynamics in 2013, a company famous for its dog-like and humanoid robots, before selling it to SoftBank in 2017.

However, the launch of these new AI models is a renewed push into robotics, with Google focusing on AI-driven intelligence rather than hardware development.

With the integration of multimodal AI into robotics, Google aims to bridge the gap between digital intelligence and real-world functionality.

Hence, the increasing demand for automation in manufacturing, logistics, and service industries, will ensure Gemini-powered robots make work even more seamless.