How Accurate Are AI Detectors At Identifying AI-Generated Content?

4 min read

This article is for general information purposes only and isn’t intended to be financial product advice. You should always obtain your own independent advice before making any financial decisions. The Chainsaw and its contributors aren’t liable for any decisions based on this content.



In the ever-evolving landscape of AI, a new arms race is emerging: the battle between AI content generators and AI detectors. But how accurate are AI detectors really?

As tools like ChatGPT and Google Gemini become increasingly adept at producing human-like text, the demand for reliable AI detection methods has skyrocketed. 

But are these things legit, or is it a bunch of snake oil? Let’s dive in…

AI content generators vs AI content detectors

In one corner, we have the AI content generators, churning out eerily human-like articles, essays and stories in the blink of an eye (well, maybe a really slow-blinking eye). In the other corner, the AI detectors, touted as a safeguard against the rise of the machines. But can they really deliver on that promise?

How does AI detection work?

So, how exactly do these AI detectors actually detect the AI? It all comes down to analysing patterns and quirks in the text. Here are a few of the key factors they consider:


Perplexity is a measure of how “surprised” a language model is by a given piece of text. The idea is that AI-generated content will have lower perplexity, as it follows more predictable patterns. 


Burstiness looks at variations in sentence structure and complexity. The theory goes that human writing has more natural ebbs and flows, while AI-generated text might be more uniform. 

Are AI detectors accurate and trustworthy?

AI detectors make some pretty bold claims about their accuracy, but do they live up to the hype?

Several well-known AI detection tools, such as Turnitin and GPTZero, claim impressive accuracy rates in identifying AI-generated text. Turnitin, for example, says it successfully flagged millions of papers containing substantial amounts of AI-generated content between April and October 2023. These tools often cite advanced algorithms and machine learning techniques as the keys to their purported success.

Reality of AI detectors

But here’s the thing: despite all the confident claims, the real-world performance of AI detectors is just plain bad. Studies have shown that these tools often get it wrong, either by flagging human-written text as AI-generated (false positives) or by failing to spot actual AI content (false negatives).

A recent study in the International Journal for Educational Integrity shed some light on the limitations of AI detectors. It found that these tools had a harder time identifying content from newer, more advanced AI models. The detectors did okay with older stuff like GPT-3.5, but struggled with the more sophisticated systems.

Plus, AI technology is evolving so fast that detectors are constantly playing catch-up. As AI models get smarter and better at mimicking human writing, flailing AI detectors will continue to be left in the dust, unless they evolve to actually work.

So if not with AI detectors, how can you tell if something is written by AI?

If AI detectors aren’t foolproof, what’s a guy or gal to do? Here are a few tips for spotting AI-generated text with the naked eye:

Tips for spotting AI-generated content

There are a few red flags you can look out for when trying to spot AI-generated content. One telltale sign is repetitive phrases or unusual word choices that just don’t sound quite right. AI-generated text might also lack original ideas or personal anecdotes, since it’s based on patterns and data rather than real-life experiences. 

Another thing to watch for is inconsistencies in style or tone. If the writing seems to switch gears abruptly or doesn’t flow naturally, that could be a hint that an AI model is behind the wheel. And of course, if you spot factual errors or statements that just don’t make sense, that’s a pretty big red flag that AI is being used (or the writer is just dumb).

Combining tools and human insight

If you simply must employ an AI detector, don’t take its word as gospel, and also use human judgement. 

Use of AI detection in schools

One of the most high-stakes arenas for AI detection is in education, where the rise of AI-powered cheating has become a major concern. Many schools are turning to tools like Turnitin’s AI detector to flag suspicious papers. But as we’ve seen, these tools suck.

False accusations of cheating can have serious consequences for students, as in the case of a Hong Kong Baptist University student who was wrongly flagged by Grammarly’s AI detector. On the flip side, if AI detectors fail to catch actual cheaters, it undermines the integrity of the education system. 

The future of AI detectors

So, what does the future hold for AI detection? Unfortunately, I left my crystal ball in my other bag, but we can make one claim pretty confidently: AI detectors simply do not work in their current state. This means they’ll need to actually start doing what they claim to do, otherwise eventually even the normies will realise they suck.

Emerging technologies and innovation

Researchers are exploring new techniques like stylometric analysis (think: AI fingerprinting) and more sophisticated versions of watermarking to improve detection accuracy. As AI models themselves become more transparent and interpretable, it may also become easier to spot telltale signs of AI generation.

Is there potential for improvement?

Ongoing research and development can help make the tools more reliable. Time will tell.

The importance of human review

As disappointing as this will be to those who’ve sipped the AI detector Kool-Aid, there is simply no substitute for human discernment (yet). 

While these tools can be valuable for flagging potential issues and obvious cases, the final verdict should always come from a human reviewer who can consider the full context and nuance of the content.

So, next time you see an article touting an AI detector with near-perfect accuracy, take it with a whole stacked jar of salt.

Main image: Getty