OpenAI has launched a new feature within ChatGPT designed to carry out complex tasks on users’ behalf.
The feature, named ChatGPT Agent, is now available to subscribers on the Pro, Plus, and Team plans.
OpenAI’s Agent blends task automation with web interaction, allowing it to operate more like a human assistant than a simple text generator.
In practical terms, the system can book outfits for events, factor in dress codes and weather conditions, prepare research reports from multiple sources, generate slide presentations, manage calendars, and even code, all from a single conversation.
“We’re introducing ChatGPT Agent—a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths,” OpenAI announced in a statement posted to X (formerly Twitter).
The Agent is built to act, and also respond. OpenAI combined elements from its earlier tools, including Operator, which allowed web navigation; and Deep Research, which could summarise content from dozens of sites.
Users don’t need to learn new commands, simple natural language instructions activate the Agent’s capabilities.
For now, only paying customers get access. Once activated through a dropdown menu labelled “agent mode”, the system uses its own virtual computer to interact with the internet and connected applications such as Gmail or GitHub. This means it can search emails, gather files, and retrieve relevant data without requiring multiple prompts.
But the most interesting development may not be what the Agent can do, but how it performs. OpenAI reports that the model behind the Agent scored 41.6% on Humanity’s Last Exam, a challenging assessment designed to test AI reasoning across hundreds of subjects.
That’s nearly double the performance of previous models like o3 and o4-mini. On FrontierMath, regarded as one of the hardest maths tests, the Agent scored 27.4% when tool access was enabled, well ahead of its predecessors.
The company also addressed issues about security risks. In a safety disclosure, OpenAI admitted that the Agent’s broader capabilities could be exploited if misused. Specifically, the Agent is classified as “high capability” in fields like biology and chemical weapons, meaning it could potentially assist in harmful activities.
Though OpenAI said it has no evidence of real-world misuse, it has introduced multiple safeguards. These include real-time monitoring that checks whether a user’s prompt relates to biology. If so, responses are filtered through a secondary system to ensure they do not pose a risk.
Added to these, OpenAI has disabled ChatGPT’s memory function within the Agent to prevent sensitive information from being stored or exploited through prompt injection attacks. According to the company, “we may revisit adding the feature in the future, however.”
There are objections, however, about how the Agent will perform outside controlled environments. So far, most AI agents have struggled with real-world complexity, usually underdelivering.
OpenAI maintains that this version is different, though the company concedes that real-world testing will reveal its true strengths and weaknesses.
Tech giants, including Microsoft, Salesforce, Oracle, and others are developing autonomous digital assistants, having spent billions trying to build similar tools. But then, until now, even leading models have fallen short when handling multi-step or unpredictable tasks.
OpenAI seems determined to change this.