← Back to blog

June 22, 2026 · 6 min read

How do AI detectors work: a plain english guide

If you have ever wondered what AI detectors actually measure, this plain English guide walks through perplexity, burstiness, classifiers, and the tech that powers detection tools.

How do AI detectors work: a plain english guide

You paste a paragraph into an AI detector. Three seconds later it says "98% AI-generated." Then you paste something you wrote yourself, and it says "72% AI."

What is actually happening under the hood?

Most people treat AI detectors as magic black boxes. But the core ideas are surprisingly simple. Once you understand how they work, you can make smarter decisions about which tools to trust and when to ignore the score entirely.

What AI detectors actually do

AI detectors do not read text the way you do. They do not understand meaning, check facts, or decide if something "sounds smart." They ask one question: how statistically predictable is this writing?

When a language model like GPT-4 writes text, it predicts the next most likely word, one after another. The result is writing that is fluid, grammatically correct, and unusually consistent. Human writing, by contrast, is messier. We jump between short sentences and long ones. We pick weird words on purpose. We break patterns without noticing.

Detectors exploit this difference. They measure the gap between how machines write and how people write. The wider the gap, the more confident the result.

Perplexity: the surprise meter

Perplexity is the most important concept in AI detection. It measures how "surprised" a language model is by a given text.

Here is how it works. A detector runs your text through a reference language model, usually something like GPT-2 or a fine-tuned classifier. For each word in your sentence, the model asks: "Given the words before this one, how likely was this word?"

If every word is exactly what the model expected, perplexity is low. That is a strong AI signal. AI-generated text scores low perplexity by construction because the model wrote the words it would have predicted anyway.

Human writing spikes higher. Humans choose words for reasons a model does not optimize for: memory, rhythm, specificity, dry humor, an inside joke only three people will catch. Those choices show up as perplexity spikes and flag as human signals.

A practical example. "The sky is blue" has low perplexity. It is predictable. "The sky is remembering the rain we never had" has higher perplexity. It is surprising. Most detectors would mark the first as AI-like and the second as human-like.

Burstiness: the rhythm pattern

If perplexity is about individual word choices, burstiness is about the overall rhythm of a piece of writing. It measures how much the sentence structure varies across a passage.

Human writing is bursty. We write a three-word sentence. Then a long, winding sentence with multiple clauses that picks up momentum. Then another short one. This back-and-forth rhythm is what makes writing feel natural and alive.

AI writing tends to be smooth and uniform. Even when an AI produces a short sentence, the surrounding sentences stay similar in length and structure. The rhythm is flat. Low burstiness is a reliable AI signal.

This is also why simple synonym-swapping does not fool detectors. Changing individual words does not change burstiness at all. The rhythm stays flat. You have to vary sentence structure, not just vocabulary.

Classifiers and embeddings: the heavy lifters

Perplexity and burstiness are the signals everyone talks about. But modern detectors use more than just those two metrics. Most tools now combine multiple detection methods into a single system.

Classifiers are supervised machine learning models trained on massive datasets of human and AI text. They learn to spot patterns that separate the two categories: overuse of certain transition words, repeated sentence templates, hedging language like "it is important to note that." The classifier does not need to know why these patterns exist. It just learns which ones correlate with AI writing.

Embeddings take a different approach. They convert words and phrases into vectors, essentially a map of language where similar meanings sit close together. AI-generated text often shows tighter semantic clustering around the prompt. Human writing wanders more. Embedding-based detectors measure this.

GPTZero, one of the most widely used detectors, uses a multilayered system with seven components, not just perplexity and burstiness. Other tools like Originality and Copyleaks use similar ensemble approaches. The best detection happens when multiple weak signals combine into a strong one.

This is also why blindly averaging scores from multiple free detectors gives you worse results. Each tool weights its signals differently. Some lean heavily on perplexity. Others on classifiers. Combining weak signals from tools that use the same underlying approach does not make detection stronger.

Watermarking: the hidden signal

There is another approach that works differently from everything we have covered so far: watermarking. Instead of trying to detect AI text after it is written, watermarking embeds a hidden signal during generation.

The idea works like this. When a model generates text, it normally picks the next most likely word. With watermarking, the model slightly biases its word choices toward a secret "greenlist" of tokens, keyed to a hidden cryptographic key. Later, a detector can check whether the text is statistically enriched in the greenlist. If yes, the text was probably generated by that specific model.

Watermarking sounds like the clean solution. But it has real problems. Heavy editing, paraphrasing, or translation can erase the signal. And watermarking only works when the model owner chooses to embed it. OpenAI has built watermarking capabilities but has not released them, partly over concerns that users would find ways to strip them.

For now, watermarking is a research topic, not a deployed solution. Most of the detectors you encounter in the real world use the perplexity and classifier methods described above.

Why detectors flag human writing

The most frustrating experience with AI detectors is getting flagged for something you wrote yourself. This happens more often than detector companies admit.

The reason goes back to how these tools work. If your writing is naturally formal, consistent, and grammatically clean, it scores lower on both perplexity and burstiness. Technical writers, academics, ESL writers who have been taught to prioritize clarity, and anyone who writes in a structured style is at higher risk of false positives.

Stanford researchers found that AI detectors flagged over 60% of essays written by non-native English speakers as AI-generated. The tools were not detecting AI. They were detecting clean, predictable prose, which correlates with AI writing but also correlates with people who learned English through formal instruction.

Short texts are another blind spot. AI detectors fail on passages under 200 words because there are not enough data points for statistical analysis. A single paragraph gives you almost no signal. Tools that confidently label a two-sentence bio as "100% AI" are statistically indefensible.

This is the most important thing to remember: AI detectors output probabilities, not verdicts. A score of 80% does not mean the text was 80% written by AI. It means the text shares patterns with 80% of the AI-written samples in the detector's training data. Those are different things.

The responsible way to use AI detectors is as one signal among many. Cross-check with writing history, drafts, and the writer's known style. You can compare results from multiple detectors when the stakes are high, but understand that each tool has the same fundamental weaknesses. None of them are ground truth.

AI detectors are not magic. They are statistical tools that measure predictability and rhythm. Understanding what they actually do makes you less likely to overtrust them and more likely to use them well.

Frequently asked questions

What is perplexity in AI detection?

Perplexity measures how predictable a text is to a language model. Low perplexity means the model easily predicted each word, which suggests AI generation. High perplexity means the model was surprised, which suggests human writing. It is a signal, not proof.

Can AI detectors be fooled?

Yes. Simple methods like synonym swapping barely work, but rewriting at the sentence structure level, varying sentence length, and adding human-like quirks can reduce detection scores. Heavy human editing of AI drafts also makes detection harder because the final text carries both human and machine patterns.

Why do AI detectors flag non-native English as AI?

Non-native English writing often has lower burstiness, more consistent sentence structures, and fewer colloquial quirks. These are the same patterns detectors associate with AI text. The tools confuse formal, grammatically clean writing with machine generation.

Are paid AI detectors better than free ones?

Generally yes, but not always. Paid tools like GPTZero and Originality use ensemble methods combining multiple detection signals. Free tools often rely on a single metric like perplexity. But no detector is perfect, and paid tools also produce false positives, especially on short or formal text.

Does Google use AI detection to rank content?

Google has stated it does not penalize AI-generated content automatically. Its quality guidelines focus on helpfulness and expertise, not how the text was produced. But AI text that reads like filler or lacks original insight performs poorly regardless of detection scores.