Quick Answer: AI detectors work by measuring two main signals in text: perplexity (how predictable the word choices are) and burstiness (how evenly sentence length and structure vary). Since AI models pick the most statistically likely next word, their output tends to be more uniform and predictable than human writing. AI Busted gives you a free AI Detector score and a free AI Humanizer so you can check your text and rewrite flagged sections in one place.
Most people see an AI detector score and just accept it. 87% probability. Likely AI. But nobody bothers to ask what the tool is actually measuring. It is not reading your text for "robot vibes." It is running math on word probabilities. Once you understand what perplexity and burstiness actually are, those scores make a lot more sense, and you can make smarter decisions about what to do when you get flagged.
What is AI detection?
AI detection is software that tries to guess whether a piece of text was written by a human or generated by a large language model like ChatGPT, Claude, or Gemini. It does not look for watermarks or hidden signals. It analyzes statistical patterns: how words are chosen, how sentences are built, and how predictable the text is at a word-by-word level.
These tools became mainstream in 2023 when ChatGPT launched and schools, publishers, and platforms scrambled for ways to spot AI-written content. Today the biggest names are Originality.ai, GPTZero, Turnitin, Copyleaks, and Sapling. They all use variations of the same core approach, even if each one tweaks the details.
How AI detectors analyze text
Think of an AI detector as a prediction engine running in reverse. A language model like GPT-4 writes text by predicting the next word over and over. An AI detector does something similar: it asks "how predictable would this text look to a language model?" If the text is highly predictable at every step, the detector flags it as AI. If it is full of surprising word choices and uneven rhythms, the detector leans human.
This is not the same as checking grammar or style. An AI detector does not care about split infinitives or passive voice. It cares about whether the word "the" was the most obvious choice in position 47, or whether the writer chose "unexpectedly" instead.

Behind the scenes, detectors use two main measurements: perplexity and burstiness. Nearly every commercial detector today builds on these two ideas, even if they wrap them in proprietary models and training data.
Perplexity: the core metric
Perplexity measures how surprised a language model would be by each word in a sentence. Low perplexity means the model found the text predictable and easy. High perplexity means it kept getting thrown off.
AI models are trained to minimize perplexity. They are literally optimized to pick the most likely next word. So when you read ChatGPT output, the word choices feel smooth, safe, and rarely surprising. That consistency is what detectors pick up on.
Human writers, by contrast, are full of weird word choices. We mix formal and casual language in the same paragraph. We use words that are not the most obvious option. We switch tone mid-sentence. All of this raises perplexity, and that makes the text look more human to a detector.
Here is a simple example. If you write "The cat sat on the," a language model assigns a very high probability to "mat." It is the expected word. Choosing "windowsill" instead makes the sentence less predictable. An AI detector sees that lower predictability and scores the text as more likely human.
Burstiness: why human writing looks different
Burstiness measures how much sentence structure varies across a piece of text. Humans write in bursts: one sentence is five words, the next is thirty, then back to four. We ask a question. Then we answer it in a fragment. Then we write a long, winding sentence with three clauses and a parenthetical. The rhythm is uneven.
AI models, especially earlier versions like GPT-3.5, tend toward uniform sentence length and structure. Each sentence is around the same length. Each paragraph has a similar shape. There are fewer fragments, fewer sharp turns. Detectors flag this lack of burstiness as a sign of machine writing.

Newer models like Claude 4 and GPT-5 have gotten better at mimicking burstiness. They can produce more varied rhythm on command. But they still drift toward structural consistency over longer passages, especially when the prompt does not explicitly ask for style variation.
Perplexity vs. burstiness: what each one catches
| Metric | What it measures | What high values mean | What low values mean |
|---|---|---|---|
| Perplexity | Word-level predictability | Surprising word choices, likely human | Predictable words, likely AI |
| Burstiness | Sentence structure variation | Uneven rhythm, likely human | Uniform rhythm, likely AI |
| Combined | Both signals together | Strong signal for human writing | Strong signal for AI writing |
Why false positives happen
Even the best detectors get it wrong sometimes, and understanding why comes back to perplexity. If you are a non-native English speaker, your writing may use more predictable word choices because you learned from textbooks and standardized material. A detector sees low perplexity and flags you as AI, even though you wrote every word yourself.
The same thing happens with formal writing. Legal documents, scientific papers, and technical manuals are designed to be predictable and consistent. That is the point. But to an AI detector, that consistency looks suspicious. Students submitting formal essays and professionals writing reports are the most common false positive victims.
There is also the problem of copy-editing tools. If you run your text through Grammarly or a rewrite tool, those tools smooth out your natural burstiness. The result reads cleaner to a human but looks more uniform to a detector. You end up with a worse detection score after improving your writing, which is the opposite of what most people expect.
How different detectors compare
Not all detectors use perplexity and burstiness the same way. GPTZero was built specifically for education and puts more weight on burstiness patterns. Originality.ai was trained on a wider range of AI models, including Claude and earlier GPT versions, and uses a proprietary scoring model on top of standard perplexity measurements. Turnitin integrates detection into its existing plagiarism workflow, which means it also checks against student paper databases.
Copyleaks takes a different approach: it cross-references text against its own database of AI-generated content, adding a matching layer on top of the statistical analysis. Sapling focuses more on sentence-level perplexity and tends to flag individual sentences rather than whole documents.
This is why the same text can get different scores across tools. One detector may weight perplexity more heavily while another relies on burstiness. There is no single agreed-upon standard for what counts as AI writing, which is why AI Busted recommends checking against more than one detector before making a final call.
Common Questions
Can AI detectors be fooled by adding typos?
Adding random typos does not reliably lower detection scores and can make your writing look sloppy. Detectors are trained on clean text, so a typo might raise perplexity slightly, but the effect is inconsistent. A better approach is to vary sentence length and word choice intentionally, which addresses both perplexity and burstiness at the source.
Do AI detectors check for watermarks?
No. Current commercial detectors do not look for watermarks because major AI models like ChatGPT, Claude, and Gemini do not embed watermarks in their output. The technology for AI watermarking exists in research but is not widely deployed in consumer models. Detectors rely entirely on statistical pattern analysis.
Why did my original writing get flagged as AI?
This usually happens when your writing is highly formal, uses predictable sentence structures, or was polished with editing tools. Non-native speakers and people who write in technical fields are disproportionately affected. If this happens to you, run your text through a humanizer like AI Busted to restore natural burstiness without changing your meaning.
Are newer AI models harder to detect?
Yes. GPT-5 and Claude 4 produce text with more varied sentence structures and less repetitive word choices than earlier models. The gap between AI and human writing is narrowing, which means detectors need to keep updating their models. This is an ongoing arms race, not a solved problem.
What should I do if my essay gets flagged?
First, do not panic. False positives are common, especially for formal academic writing. Second, run your text through a second detector to see if the results are consistent. Third, if you used AI for research or drafting but wrote the final version yourself, keep your revision history or notes as evidence. Finally, use AI Busted's free AI Humanizer to rewrite flagged passages and re-check the score.