Google Launches Gemini 2.0, Multimodal AI Ushering in the ‘Agentic Era’

Enhanced Multimodal Functions

A major feature of Gemini 2.0 is its advanced multimodal functions. Unlike Gemini 1.5, the new model supports inputs across text, images, audio, and video while enabling outputs such as native image generation and multilingual text-to-speech. These innovations bring in richer and more interactive user experiences.

For developers, the experimental Gemini 2.0 Flash model is now available via Google AI Studio and Vertex AI. This version prioritizes low latency and high performance, ideal for dynamic applications.

Added to these, the new Multimodal Live API allows real-time audio and video streaming, opening possibilities for more immersive use cases.

Expanding AI Integration Across Google’s Services

Starting today, Gemini 2.0’s functions will be accessible to users of the Gemini app, with broader integration into Google products like Search expected by early next year. A feature called Deep Research is also being introduced, which Pichai described as:

“A research assistant, exploring complex topics and compiling reports on your behalf.”

This feature uses advanced reasoning and long-context capabilities to provide deeper insights. It is available in Gemini Advanced starting today.

AI Overviews in Search will also receive enhancements powered by Gemini 2.0, allowing users to tackle complex queries involving multi-step questions, advanced maths, and multimodal inputs. Pichai noted:

“No product has been transformed more by AI than Search. Our AI Overviews now reach 1 billion people, enabling them to ask entirely new types of questions — quickly becoming one of our most popular Search features ever.”

New AI Agents on the Horizon

Google is leveraging Gemini 2.0 to pioneer a new class of AI agents designed to perform complex tasks, including:

Project Astra: A prototype universal assistant that combines tools like Google Search and Maps while enhancing memory and multilingual dialogue capabilities.
Project Mariner: An experimental browser assistant capable of interacting with web pages to complete tasks.
Jules: A coding assistant that integrates with GitHub, enabling developers to plan and execute coding tasks under supervision.

These projects are currently in testing with trusted users as Google fine-tunes their safety and functionality.

Responsible AI Development

Throughout the development of Gemini 2.0, Google has emphasised ethical practices, rigorous safety measures, and collaboration with internal and external experts.

The model is built using Trillium, Google’s sixth-generation Tensor Processing Units (TPUs), which powered 100% of Gemini 2.0’s training and inference.

Pichai highlighted the importance of these foundational technologies: “2.0’s advances are underpinned by decade-long investments in our differentiated full-stack approach to AI innovation. It’s built on custom hardware like Trillium, our sixth-generation TPUs. TPUs powered 100% of Gemini 2.0 training and inference, and today Trillium is generally available to customers so they can build with it too.”

Gemini 2.0 completely changed the way users interact with AI, offering tools that are intuitive, versatile, and capable of enhancing everyday tasks. From simplifying research to assisting with coding, this next-generation model promises to be a cornerstone of future AI technologies.

Sundar Pichai encapsulated the company’s vision: “Today we’re excited to launch our next era of models built for this new agentic era: introducing Gemini 2.0, our most capable model yet. With new advances in multimodality — like native image and audio output — and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant.”