How Do AI Detectors Work? The Detection Pipeline Explained

May 6, 2026
9 min read

Adult writer discusses AI detection review with two specialists in a modern writing support lounge with closed unmarked folders.

AI detectors measure how expected each word choice is - a signal called perplexity. Text written by AI tends to score low on perplexity, since models favor high-probability word combinations, while human writing scores higher, since people make word choices that language models would not rank highly. AI Busted is a free AI detector and humanizer. Paste any text to get a detection score, then rewrite it with tone and vocabulary controls in one place, and treat the result as one signal rather than a verdict.

AI detectors are now standard at universities, publishers, and hiring platforms. Most people on the receiving end of a detection score have no idea how AI detection works, or why a result might be wrong. The focus here is the technical pipeline: what the probability math actually measures, where each step lives, and where the whole chain breaks down. For a broader look at what AI detection is and why it matters, see our guide to AI detection.

What is AI detection?

AI detection is the process of estimating whether a piece of text was written by a human or produced by a language model like GPT-4, Claude, or Gemini. Detectors do not read for meaning. They analyze statistical patterns: how closely the text follows the probability curves of a reference language model. Every word in a sentence has a probability score when placed after the words that came before it. Language models assign those scores as part of how they work. AI detectors borrow that same probability-scoring engine to ask: does this text look like something a model would write? A text can score "AI-written" without a single factual error. It can score "human" even when a model wrote 80% of it. The score reflects statistical patterns in word choice, not the source or intent behind the writing. That gap between what the score measures and what people think it proves is where most institutional misuse begins.

What signals do AI detectors actually look at?

Most detectors measure two overlapping signals: perplexity and burstiness. Perplexity measures how expected each word is to a language model given the words before it. Low perplexity means the text follows the model's internal probability curve closely: the writer kept choosing words the model would have ranked highly. High perplexity means the writer made choices the model would not have prioritized. Burstiness captures how much perplexity varies across sentences. Human writing swings between high- and low-perplexity sentences. A short punchy line followed by a long explanatory one, then a fragment, then a longer arc. AI-written text stays flatter across sentences, producing a narrower variance pattern that detectors can measure directly. Some detectors add stylometric signals like sentence length, punctuation frequency, or vocabulary spread. These fingerprints are compared with labeled human and AI samples, then folded into a vendor-set confidence threshold.

Student and writing advisor discuss AI detection results through human judgment while walking in a quiet campus hallway.

How does the detector score your text?

When you paste text into a detector, the tool feeds that text through a reference language model the vendor controls internally, then collects a log-probability score for each word given its surrounding context. This step is commonly called "tokenization" in popular writing about AI detectors. That framing is technically off. Tokenization is preprocessing: splitting text into sub-word units before any probability math happens. The detection-relevant step is the log-probability scoring that follows tokenization. Getting that distinction right matters when you evaluate claims about how a detector works, or why it misfired on a specific piece of text. The choice of reference model sets a ceiling on the whole pipeline. A detector calibrated against GPT-3 output will miss patterns specific to Claude 3.5 or Gemini 1.5. Newer models produce text with different stylistic fingerprints: smoother syntax, less repetition in phrasing. Older reference models may not flag those patterns reliably. The text samples used to calibrate the reference model determine how well the detector handles AI output that post-dates those samples.

How does burstiness scoring work?

Once per-word probability scores exist, the detector groups them by sentence and calculates how much those scores vary. That variance measurement is burstiness. AI-written text stays low and flat across a passage. Human writing varies: a simple sentence, then a complex one, then a fragment, then a longer explanatory arc. Models consistently favor the smoothest, most probable continuation of the text. That makes the flat-burstiness pattern hard to avoid at scale. Detectors weight burstiness differently. Some treat it as the primary indicator. Others fold it in as a correction factor when perplexity alone produces a weak signal. The exact weighting is rarely published, which makes it difficult to interpret what a borderline score actually means for your specific text.

How does the detector calibrate its score?

With perplexity and burstiness scores in hand, the detector checks where the text lands relative to a reference distribution of known AI-written and known human-written passages. This step maps closely to the DetectGPT method published by Mitchell, E., Lee, Y., Khazatsky, A., Manning, C.D., & Finn, C. (2023) in arXiv:2301.11305. Their central finding: AI-written text tends to sit near a local maximum of the model's probability surface, while human-edited text lands in regions of lower curvature. The detector applies small perturbations by swapping words and adjusting phrasing, then checks whether the probability score rises or falls. If the text stays near the top of the probability curve after those changes, that supports an AI-authorship signal.

What does the final detection score actually mean?

The final step turns the math into something visible: a percentage, a label like "likely AI-written," or a sentence-level view where the tool marks specific passages as high-confidence AI. Some detectors flag individual sentences they scored as AI-likely. That helps manual review, but sentence-level labels carry the same uncertainty as the full score: they show where the probability pattern looks model-like, not who wrote it. The output is not a fact. It is a probability estimate with an unpublished confidence interval, so you rarely know the false positive rate for your topic, genre, or writing style.

What does the full detection pipeline look like?

Pipeline step	What the detector does	Signal direction
Log-probability scoring	Feeds text through reference model, scores each word in context	Lower score = more AI-like
Perplexity	Averages per-word probability scores across the passage	Lower = more AI-like
Burstiness	Measures variance in per-sentence perplexity	Lower variance = more AI-like
Probability curvature check	Perturbation test against reference distribution (DetectGPT method)	Near local probability max = more AI-like
Score output	Blends signals into a percentage with a vendor-set confidence threshold	0-100% AI confidence

Most commercial detectors run steps 1-3 and 5. The curvature check (step 4) is more common in research settings. Watermark detection and multi-model scoring (Binoculars) are separate pipelines that bypass perplexity altogether.

Test your own text with AI Busted. Free.
Free detector and humanizer. Adjust tone and vocabulary level. See the score, then rewrite in one step.

Why do two detectors disagree on the same text?

Run the same paragraph through GPTZero and Winston AI and you will often get different results, sometimes wildly different. See our 2026 AI detector test results for concrete examples. Three things drive the divergence. First, each detector uses a different reference model. The probability scores they assign to the same word sequence are not identical, so the perplexity reading diverges before any other calculation happens. Second, calibration differs. Each vendor sets thresholds against its own labeled samples, so model mix, writing style, and topic area all set where the cutoff lands. Third, burstiness weighting differs. A detector that weights burstiness heavily will flag flat-rhythm text quickly. A detector that weights perplexity as the primary signal may pass the same paragraph with a comfortable margin.

Hands arrange closed unmarked folders and a magnifying glass to represent a private AI detection review process.

Where does the detection pipeline break down?

The pipeline has consistent failure modes. Knowing them matters whether you are the one running the detector or the one whose writing is being scored.

ESL and neurodivergent writers

Research by Liang et al. (2023) found false positive rates above 61% for non-native English writers across several commercial detectors. Short sentences, limited vocabulary range, and constrained syntax can push perplexity down in the same direction AI-written text moves. According to the NIST AI Risk Management guidelines, high-stakes AI-assisted systems need human oversight and appeal routes. Treat AI Busted as a second signal in that review, not a sole arbiter.

Post-edited AI text

If a writer starts with GPT-4 and then edits heavily, burstiness rises and perplexity increases. The text may score "human" as detectors only see final word choices, not revision history.

Domain-specific writing

Technical writing, legal text, and scientific abstracts often have low perplexity by nature. Formal style rules push toward standard phrasing, so detectors calibrated on general-web text can over-flag these genres.

The base-rate problem

Suppose only 5% of submissions are AI-written. Even a strong detector can produce nearly as many false positives as true positives, since the human-written pool is much larger. That base-rate problem applies to any binary classifier when the target event is rare. For thorough reliability numbers across tools, see how reliable are AI detectors.

How do watermark detection and multi-model scoring work?

Two parallel approaches work differently from the perplexity pipeline.

Watermarking

Kirchenbauer et al. (2023) showed that a model can embed a statistical watermark during generation by nudging word choices into a pseudo-random pattern. A matching detector checks for that pattern without perplexity scoring, but this works only when the original model was configured to add the watermark.

Multi-model scoring

The Binoculars approach (Hans et al., 2024) runs the same text through two models: a scorer and an observer. It measures how much their probability assignments diverge, which can reduce false positives for non-native English writers compared with single-model perplexity scoring.

Common Questions

Can you trick AI detection by adjusting sentence length?

Varying sentence length raises burstiness, which can move a detector toward a more human reading. It does not reliably fool detectors, since perplexity scoring still runs separately and most tools blend both signals. AI Busted lets you test the same passage before and after edits to see whether the score actually changed.

Do AI detectors use the same model under the hood?

No. GPTZero, Winston AI, Originality.ai, and similar tools use different reference models, calibration samples, thresholds, and burstiness weighting. That is why the same passage can score 20% on one tool and 80% on another. The score belongs to that vendor's pipeline, not to a universal standard.

Why do AI detectors flag human writing?

Detectors flag human writing when phrasing has low perplexity and closely follows patterns a reference model ranks highly. Formal writing, constrained vocabulary, and academic abstracts can all look this way. Research published in PMC (Liang et al., 2023) found non-native English writers face high false positive rates, above 61% in their sample.

See how the same passage scores in AI Busted versus a second detector before drawing conclusions.
Free AI detector and humanizer. Paste text, adjust tone, see the score change. No guesswork.