How OpenAI Lockdown Mode Strengthens AI Security for Businesses

OpenAI has launched a new security setting called Lockdown Mode in ChatGPT, alongside “Elevated Risk” labels for certain features across its products.

The update focuses on one issue, which is prompt injection. In these attacks, hidden instructions are placed inside content an AI system reads. The goal is to mislead the system into revealing sensitive data or taking actions it should not take.

Lockdown Mode is optional and built for a small group of users who face higher security risks, including executives and security teams in major organisations. Most users will not need it.

When switched on, Lockdown Mode restricts how ChatGPT interacts with external systems. It disables certain tools that attackers could exploit to extract data from conversations or connected applications.

Web browsing is limited to cached content. No live network requests leave OpenAI’s controlled network. Some features are disabled entirely where the company says it cannot guarantee data safety.

The setting is now available for ChatGPT Enterprise, ChatGPT Edu, ChatGPT for Healthcare and ChatGPT for Teachers. Workspace administrators can enable it by creating a new role in Workspace Settings. Once activated, it adds tighter limits on top of existing controls.

At the same time, administrators keep granular control. They decide which apps remain available in Lockdown Mode and what actions users can take within those apps. Separately, the Compliance API Logs Platform provides visibility into app usage, shared data and connected sources.

OpenAI said it plans to make Lockdown Mode available to consumer users in the coming months.

The OWASP GenAI Security Project has classified prompt injection as a top vulnerability for large language models. It noted that malicious prompts can alter AI behaviour in unintended ways, even when they appear harmless.

Google’s security team has warned about indirect prompt injections, where hidden instructions are embedded in emails or documents. An AI system may access those sources and leak sensitive data without the user knowing.

Attackers have embedded instructions inside webpages or retrieved documents, causing AI systems to carry out harmful actions. In one case involving Gemini in Google Translate’s Gemini Mode, researchers showed how translation functions could be bypassed to generate dangerous content.

Anthropic recently published findings on prompt injection failure rates. It reported that even advanced models could be breached in certain contexts. In GUI-based systems with extended reasoning enabled, attack success rates exceeded 50% after repeated attempts.

Security researchers have also identified newer forms of attack, including Logic-Layer Prompt Control Injection, which targets deeper parts of AI systems such as persistent memory and retrieval logic.

By restricting live network access and disabling high-risk tools, Lockdown Mode addresses common attack surfaces linked to prompt injection, including browsing and connected apps.

The company said the feature builds on existing safeguards such as sandboxing, monitoring, enforcement, role-based access and audit logs, while adding stricter limits.

Alongside this, OpenAI has standardised “Elevated Risk” labels across ChatGPT, ChatGPT Atlas and Codex. These labels mark features that may introduce additional risk, particularly those involving network access.

For instance, in Codex, developers can grant network access so the system can retrieve documentation or perform actions on the web.

Where that access is enabled, the interface now displays an “Elevated Risk” label explaining what changes, what risks may arise and when such access may be appropriate.

The company said it will continue to review which features carry the label. As security protections improve and risks are reduced, it plans to remove the label from features considered safe for general use.