UPDATED 12:00 EDT / AUGUST 05 2024

AI

OpenAI can detect text generated by ChatGPT, but it’s not ready for prime time

OpenAI, the company behind the ultra-popular generative AI chatbot ChatGPT, has been developing a way to detect text generated by its AI tools but says that it’s not prepared to release the feature to the public.

In an exclusive report, the Wall Street Journal noted that the capability has been warming the company’s servers for almost two years but has been the subject of intense internal debate.

Such a tool would be useful across numerous industries where the use of AI tools has become increasingly present. To start, schools have begun to use AI checkers to determine if students are using tools to essentially “cheat” on their work by having bots produce their long-form work for them such as essays, research papers or homework. Literary publishers are beginning to see more submissions that look like AI-generated work as well.

In response to the Joural report, OpenAI updated the blog post it published in May to reflect information concerning watermarking ChatGPT-generated text and AI detection tools. At the forefront, the company said it has no current plans to release the tool in the near future.

The company said that it came up with a watermarking solution that is “highly accurate and even effective against localized tampering, such as paraphrasing.” But discovered that it doesn’t work very well against large-scale tampering such as rewording using another generative model. Similarly, the company noted that translating the generated text into another language and back could obliterate the watermark.

In images, watermarking works by placing visible or invisible but recognizable images, patterns, logos, text or signatures onto a document to prevent unauthorized duplication. People might recognize this easily when they’ve seen pictures or videos released by popular media outlets or stock image sites with semitransparent logos over them. This is much more difficult with text but still possible where particular phrases or words are used by the AI in common patterns that become obvious in a statistically detectable way.

The watermarking technique has been at the center of the debate about how the tool might affect users, the company said. According to a spokesperson who talked to the Journal, some employees at the company were concerned that it might disproportionately affect some groups more than others, especially nonnative English speakers. Individuals learning to speak English and those less proficient commonly use AI tools to assist them when using the language to reach native audiences and thus would be stigmatized by AI detection tools, they argued.

“The text watermarking method we’re developing is technically promising but has important risks we’re weighing while we research alternatives,” the spokesperson said. “We believe the deliberate approach we’ve taken is necessary given the complexities involved and its likely impact on the broader ecosystem beyond OpenAI.”

The company also said it’s also working on implementing text metadata as another method for labeling if text has been generated by AI. Unlike watermarking, metadata added to a text document would be cryptographically signed, proving that the document remains unchanged from its original format and source and leading to no false positives.

“We expect this will be increasingly important as the volume of generated text increases,” the company said. “While text watermarking has a low false positive rate, applying it to large volumes of text would lead to a large number of total false positives.”

Metadata and watermarking are already commonly used by OpenAI for DALL-E 3 generated images, Meta Platforms Inc. AI images for social media images and Google LLC using the SynthID system. Most metadata added to AI images comports with an open industry standards body known as C2PA, or Coalition for Content Provenance and Authenticity, which works to help set guidelines for how information about AI-generated graphics should be labeled.

Image: SiliconANGLE/Microsoft Designer

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU