How Do AI Detectors Actually Work? Perplexity, Burstiness, and the Science Explained

June 21, 2026
9 min read

Quick Answer: AI detectors work by analyzing two statistical signals in text: perplexity (how predictable each word is) and burstiness (how evenly sentence complexity is distributed). Human writing tends to have higher perplexity and more varied burstiness, while AI-generated text follows predictable patterns. Tools like AI Busted combine these signals with machine learning classifiers trained on millions of human and AI text samples to determine whether content was likely machine-generated.

You pasted your essay into a detector, held your breath, and watched the percentage tick up. 87% AI-generated. But you wrote every word yourself. What just happened?

This is the reality for students, writers, and professionals in 2026. AI detection tools have become ubiquitous, but almost nobody understands what they are actually measuring. They are not reading your prose the way a professor would. They are counting statistical fingerprints you cannot see.

Understanding how these tools work is not just interesting, it is practical. If you know what they measure, you know why they fail, what triggers false positives, and how to protect yourself when your genuine writing gets flagged.

What Is AI Detection?

AI detection is the process of analyzing a piece of text to determine whether it was written by a human or generated by a large language model like ChatGPT, Claude, or Gemini. Detectors do not look for plagiarism. They are not checking whether you copied from a source. They are checking whether the statistical shape of your sentences matches what a machine typically produces.

It is a fundamentally different problem from plagiarism detection. Plagiarism compares your text against a database of existing content. AI detection compares your text against a model of how language models behave. One checks for copying, the other checks for authorship.

Modern detectors like AI Busted, GPTZero, and Originality.ai use multiple overlapping techniques to make this call. The two that matter most are perplexity and burstiness.

Perplexity: The Predictability Score

Perplexity is the backbone of almost every AI detector on the market. It measures how surprised a language model is by each word in your text.

Think of it this way. If I write "The cat sat on the," a language model assigns a very high probability to the next word being "mat" or "floor." It is not surprised. Low perplexity. But if I write "The cat sat on the ceiling," the model assigns that a much lower probability. It is surprised. High perplexity.

AI models like ChatGPT are trained to produce text that is highly probable word after word. They pick the smoothest, most statistically likely continuation every time. This makes AI text consistently low in perplexity, like a pianist who only ever plays the most obvious next note.

Human writers break this pattern constantly. We throw in unexpected words, odd metaphors, abrupt shifts in tone, fragments. Our writing spikes in perplexity in ways that are hard to fake. A detector that only measured perplexity would be straightforward and fairly reliable.

But there is a catch. Some human writing is naturally low-perplexity. Technical documentation, legal contracts, simple how-to guides. These read like AI output even when a person wrote them. This is where burstiness comes in.

Burstiness: The Rhythm of Human Thought

Burstiness measures how sentence complexity varies across a piece of writing. Humans tend to write in bursts: a long, tangled sentence, then a short one. Three simple sentences, then a winding paragraph with a semicolon and a parenthetical. Our complexity spikes and dips unpredictably.

AI text, in contrast, is remarkably uniform. Most language models produce sentences of similar length and structure throughout a piece. The complexity graph is nearly flat. No bursts, no valleys. Just a steady hum.

Researchers at Stanford and other institutions have shown that burstiness alone can separate human from AI writing with surprising accuracy, even when the AI text has been lightly edited. The rhythm of human writing is one of the hardest things for a machine to convincingly replicate.

Combined, perplexity and burstiness give detectors a two-dimensional signal. Low perplexity plus low burstiness equals a high likelihood of AI generation. But neither signal is bulletproof on its own, and plenty of edge cases break the model.

Person typing on laptop keyboard with natural window light

How AI Busted and Other Detectors Combine These Signals

Modern detectors do not just run perplexity and burstiness in isolation. They feed thousands of features into a machine learning classifier trained on millions of labeled human and AI text samples. The classifier learns to weigh each signal based on context, language, text type, and even the specific AI model likely involved.

Here is a simplified view of how the major detectors compare in their approaches:

Detector	Core Method	Free Tier	False Positive Rate
AI Busted	Multi-model ensemble + perplexity + burstiness	Yes, 5,000 words	~3-5%
GPTZero	Perplexity + burstiness scoring	Yes, 5,000 chars	~5-10%
Originality.ai	Perplexity + neural classifier + NLP heuristics	No (paid only)	~2-4%
Turnitin AI Detection	Proprietary model-level comparison	Institutional only	~4% (claimed less than 1%)
Copyleaks	Behavioral pattern analysis + cross-lingual	Yes, 10 pages/month	~3-6%

Why AI Detectors Get It Wrong

No detector is perfect, and the reasons are baked into how they work. False positives happen when human writing happens to look statistically like AI output. This is most common for:

Non-native English writers. People writing in a second language tend to use simpler vocabulary and more predictable sentence structures. A 2023 Stanford study found that AI detectors falsely flagged over 60% of essays written by non-native English speakers as AI-generated. This is not a small edge case. It is a systematic failure.

Highly structured or technical writing. Lab reports, legal summaries, and technical documentation are naturally low in burstiness and perplexity. They are designed to be predictable and uniform. Detectors often misread this clarity as artificiality.

Heavily edited text. When you polish a piece of writing to remove quirks, fragments, and stylistic spikes, you inadvertently make it more AI-like from a statistical standpoint. Grammarly and other editing tools can actually increase your AI detection score by smoothing out the very patterns that made your writing look human.

Text that was AI-assisted but human-directed. If you use AI for brainstorming or outlining but write the final text yourself, residual patterns from the AI's influence can trip detectors. The line between "AI-assisted" and "AI-generated" is one detectors are bad at drawing.

Check Your Text for Free
AI Busted detects AI content with lower false positives than most alternatives. Free up to 5,000 words.

Can You Trust an AI Detector's Score?

The short answer: treat it as a strong signal, not a verdict. Even the best detectors carry a false positive rate of a few percent. In a classroom of 200 students, that means several genuine essays get flagged every semester.

Most responsible detector providers, including AI Busted and GPTZero, explicitly state that their results should not be used as the sole basis for academic or professional decisions. A high detection score is a reason to look closer, not a reason to make accusations.

If your own writing gets flagged, knowing how detectors work gives you the language to push back. Point out that non-native English patterns, technical subject matter, and heavy editing can all produce low-perplexity, low-burstiness text that reads as AI-generated to a statistical model. Ask for a second review. Show your drafts and edit history. The statistical fingerprint is not the same thing as proof.

The Future of AI Detection

Detection technology is improving, but it is running a race it may not win. As language models get better at mimicking human burstiness and varying their sentence structures, the statistical gap between human and AI text continues to shrink.

Watermarking, where AI models embed an imperceptible pattern into generated text, is one proposed solution. But watermarking only works if every major model adopts it, and it is trivially removed by paraphrasing.

The more likely future is one where detection is one layer in a broader authorship verification process: writing process documentation, draft history, in-class writing samples, and style consistency checks all working together. Nobody should be putting their academic or professional reputation in the hands of a single perplexity score.

Try AI Busted's Free Detector
Test your text against multiple detection models at once. Plus, use the AI Humanizer to adjust flagged content.

Academic papers with handwritten annotations on wooden desk

Common Questions

What is perplexity in AI detection?

Perplexity measures how predictable each word in a text is to a language model. AI-generated text tends to have low perplexity because models pick the most probable next word. Human writing has higher perplexity because we make less predictable word choices. Most AI detectors use perplexity as their primary signal.

Can a human write text that an AI detector flags as AI?

Yes, this happens frequently. Non-native English writers, technical documentation writers, and anyone who writes in a highly structured or simple style can produce text with low perplexity and low burstiness, the two signals that detectors key on. A Stanford study found detectors falsely flagged over 60% of non-native English essays. If your original writing gets flagged, use your draft history and writing process as evidence.

What is the difference between AI detection and plagiarism detection?

Plagiarism detection compares your text against a database of existing content to find copied passages. AI detection analyzes the statistical patterns of your writing to guess whether a language model produced it. They are completely different technologies that answer different questions. A piece of writing can be original and still get flagged as AI-generated, or plagiarized and not flagged as AI at all.

Do AI detectors work on non-English text?

Most detectors perform significantly worse on non-English languages. The training data and statistical baselines are predominantly built on English text, and the perplexity patterns of AI-generated text differ across languages. If you are writing in a language other than English, take any detection score with extra caution.

How can I prove I wrote something if an AI detector flags it?

Keep your edit history, drafts, and timestamps. Use tools like Google Docs version history or Word track changes to show your writing process. Write a short in-person sample on the same topic so your instructor can compare styles. If you used AI for brainstorming or research, document that openly. Transparency about your process is stronger evidence than any detector score.