How The ChatGPT Watermark Functions And Why It Might Be Defeated

Posted by

OpenAI’s ChatGPT introduced a method to automatically develop content however plans to introduce a watermarking feature to make it easy to discover are making some people anxious. This is how ChatGPT watermarking works and why there might be a method to defeat it.

ChatGPT is an unbelievable tool that online publishers, affiliates and SEOs concurrently love and dread.

Some online marketers like it due to the fact that they’re discovering brand-new ways to utilize it to generate material briefs, details and complicated short articles.

Online publishers are afraid of the possibility of AI content flooding the search engine result, supplanting specialist articles written by humans.

As a result, news of a watermarking feature that opens detection of ChatGPT-authored content is also expected with anxiety and hope.

Cryptographic Watermark

A watermark is a semi-transparent mark (a logo or text) that is embedded onto an image. The watermark signals who is the original author of the work.

It’s mainly seen in pictures and significantly in videos.

Watermarking text in ChatGPT includes cryptography in the type of embedding a pattern of words, letters and punctiation in the type of a secret code.

Scott Aaronson and ChatGPT Watermarking

An influential computer scientist named Scott Aaronson was worked with by OpenAI in June 2022 to deal with AI Security and Positioning.

AI Security is a research study field worried about studying ways that AI may posture a damage to people and creating methods to prevent that kind of unfavorable disturbance.

The Distill clinical journal, featuring authors connected with OpenAI, defines AI Safety like this:

“The objective of long-term artificial intelligence (AI) safety is to make sure that sophisticated AI systems are reliably aligned with human values– that they reliably do things that people desire them to do.”

AI Alignment is the expert system field concerned with making certain that the AI is aligned with the designated goals.

A big language design (LLM) like ChatGPT can be utilized in such a way that might go contrary to the objectives of AI Alignment as defined by OpenAI, which is to produce AI that benefits humanity.

Accordingly, the factor for watermarking is to prevent the abuse of AI in such a way that damages humanity.

Aaronson explained the reason for watermarking ChatGPT output:

“This could be practical for avoiding scholastic plagiarism, obviously, however also, for example, mass generation of propaganda …”

How Does ChatGPT Watermarking Work?

ChatGPT watermarking is a system that embeds a statistical pattern, a code, into the choices of words and even punctuation marks.

Material created by expert system is produced with a relatively predictable pattern of word option.

The words written by people and AI follow a statistical pattern.

Altering the pattern of the words utilized in generated material is a method to “watermark” the text to make it easy for a system to find if it was the product of an AI text generator.

The technique that makes AI material watermarking undetected is that the distribution of words still have a random appearance similar to typical AI generated text.

This is referred to as a pseudorandom circulation of words.

Pseudorandomness is a statistically random series of words or numbers that are not really random.

ChatGPT watermarking is not presently in use. However Scott Aaronson at OpenAI is on record mentioning that it is planned.

Today ChatGPT remains in sneak peeks, which enables OpenAI to find “misalignment” through real-world use.

Most likely watermarking may be presented in a final variation of ChatGPT or quicker than that.

Scott Aaronson wrote about how watermarking works:

“My main project so far has actually been a tool for statistically watermarking the outputs of a text model like GPT.

Generally, whenever GPT produces some long text, we desire there to be an otherwise undetectable secret signal in its options of words, which you can use to show later that, yes, this originated from GPT.”

Aaronson discussed even more how ChatGPT watermarking works. But first, it is very important to understand the idea of tokenization.

Tokenization is an action that happens in natural language processing where the maker takes the words in a file and breaks them down into semantic units like words and sentences.

Tokenization changes text into a structured form that can be utilized in machine learning.

The process of text generation is the machine thinking which token comes next based on the previous token.

This is made with a mathematical function that figures out the possibility of what the next token will be, what’s called a possibility circulation.

What word is next is anticipated however it’s random.

The watermarking itself is what Aaron refers to as pseudorandom, in that there’s a mathematical reason for a specific word or punctuation mark to be there however it is still statistically random.

Here is the technical explanation of GPT watermarking:

“For GPT, every input and output is a string of tokens, which could be words however also punctuation marks, parts of words, or more– there have to do with 100,000 tokens in overall.

At its core, GPT is constantly creating a likelihood distribution over the next token to produce, conditional on the string of previous tokens.

After the neural net generates the circulation, the OpenAI server then actually samples a token according to that distribution– or some customized variation of the circulation, depending upon a criterion called ‘temperature.’

As long as the temperature level is nonzero, though, there will usually be some randomness in the option of the next token: you could run over and over with the same prompt, and get a various completion (i.e., string of output tokens) each time.

So then to watermark, rather of picking the next token arbitrarily, the idea will be to pick it pseudorandomly, using a cryptographic pseudorandom function, whose secret is known just to OpenAI.”

The watermark looks completely natural to those reading the text since the option of words is simulating the randomness of all the other words.

However that randomness consists of a bias that can only be discovered by someone with the key to decode it.

This is the technical explanation:

“To highlight, in the special case that GPT had a bunch of possible tokens that it judged similarly likely, you might simply choose whichever token maximized g. The choice would look consistently random to somebody who didn’t understand the secret, but somebody who did understand the key might later on sum g over all n-grams and see that it was anomalously big.”

Watermarking is a Privacy-first Service

I have actually seen conversations on social media where some people recommended that OpenAI might keep a record of every output it creates and use that for detection.

Scott Aaronson validates that OpenAI could do that however that doing so positions a privacy concern. The possible exception is for police scenario, which he didn’t elaborate on.

How to Discover ChatGPT or GPT Watermarking

Something interesting that seems to not be popular yet is that Scott Aaronson kept in mind that there is a method to beat the watermarking.

He didn’t state it’s possible to beat the watermarking, he stated that it can be beat.

“Now, this can all be defeated with adequate effort.

For example, if you utilized another AI to paraphrase GPT’s output– well okay, we’re not going to be able to discover that.”

It seems like the watermarking can be defeated, a minimum of in from November when the above declarations were made.

There is no indicator that the watermarking is presently in use. However when it does come into usage, it may be unidentified if this loophole was closed.


Read Scott Aaronson’s article here.

Included image by Best SMM Panel/RealPeopleStudio