OpenAI Has a Cheat Detector, But Won't Release It

OpenAI Has a Cheat Detector, But Won’t Release It

Advertisements

OpenAI is hesitating to unveil its advanced AI detection system, designed to identify text generated by its ChatGPT model.

Despite having developed a method for text watermarking and an accompanying detection tool, the company remains divided over its possible release.

For approximately a year, OpenAI has had a system ready that incorporates watermarking techniques to tag AI-generated text. This system modifies the prediction patterns of ChatGPT to embed a detectable signature within the text.

According to sources familiar with the project, internal discussions have been ongoing for around two years, with the company now deliberating the implications of making the technology publicly available.

The internal debate within OpenAI centres on balancing honest considerations with probable business impacts. On one side, releasing the watermarking tool could greatly benefit educators by offering a means to verify whether students are using AI to complete their assignments. However, issues have been raised about the possible negative effects on user experience and the company’s revenue.

The watermarking technique involves subtle alterations to how ChatGPT generates text, creating an invisible pattern that can be detected later. This method has reportedly shown high accuracy, with claims of up to 99.9% effectiveness in identifying AI-written content.

Nevertheless, OpenAI acknowledges that the technique is not foolproof. Circumvention methods, such as rephrasing or using alternative models, may undermine the effectiveness of the watermarking.

OpenAI has also touched on the ramifications of implementing this system. The company is wary that the watermarking might inadvertently stigmatise AI as a tool, particularly for non-native English speakers who may rely on such technologies for writing assistance.

There are fears that the watermarking could negatively impact user opinion, with some reports revealing that nearly 30% of ChatGPT users might reduce their use of the software if the watermarking feature is introduced.

In response to these issues, OpenAI is exploring alternative methods for detecting AI-generated text. The company has revealed that it is in the early stages of developing cryptographic metadata embedding as a potential solution.

This approach aims to offer a more powerful and less intrusive means of verifying the origin of text.