When AI Makes Your Thesis Sound More AI-Generated: CCTV Explains the Paradox of AIGC Detection

Imagine you’re a university graduate who just ran your thesis through an AIGC (AI-generated content) detection tool. The result: 62% AI rate — a staggering 47 percentage points above your school’s 15% threshold. Panicked, you turn to a large language model and ask it to “rewrite this paper to sound like a human wrote it.” You run the revised version through the same detector. The result: 94%.

This Kafkaesque scenario isn’t hypothetical. As China’s graduation season peaks, a growing number of students are discovering that AI detection tools can be not just unreliable but outright counterproductive. CCTV recently investigated the phenomenon, speaking with experts about why these systems produce such baffling results.

The Core Problem: Using AI to Detect AI

Cai Hailong, Vice Dean of the School of Education at Capital Normal University, explained the fundamental difference between traditional plagiarism detection and AIGC detection. Plagiarism checkers compare text against a corpus to identify matching passages — a deterministic process. AI detection, by contrast, uses AI systems to assess whether text overlaps with AI-generated writing in terms of semantics and linguistic style. It’s probabilistic classification, not evidence-based judgment.

“The core technical bottleneck,” Cai said, “is that we’re using AI to catch AI. This means we cannot definitively determine whether a passage was written by a human or an AI, nor can we provide a clear explanation for the judgment. That is the most critical limitation.”

Why Chinese Makes It Harder

The challenge is compounded for Chinese-language texts. Unlike English, Chinese features exceptionally rich semantics and highly varied modes of expression. This inherent flexibility causes AI detection systems to encounter far more ambiguity, increasing both difficulty and error rates. False positives are not edge cases — they’re a structural feature of the current approach.

How the Detection Actually Works

Most Chinese universities rely on AIGC detection modules built into established academic platforms such as CNKI (知网), Weipu (维普), and Wanfang (万方). CCTV asked multiple large language models to explain how these systems work, and the answers converged on two key metrics:

  • Perplexity: How “predictable” the text is. Human writing is filled with unexpected, unconventional expressions; AI output tends toward the statistically probable.
  • Burstiness: The rhythmic variation of the text. Human writing undulates like an electrocardiogram; AI output flows as flat as a straight line.

In theory, AI-generated text should score low on both measures — it’s too smooth, too predictable. But experts told CCTV that even these metrics cannot deliver 100% accuracy. AI text generation works by predicting the next most probable word, based on probability distributions. Detecting that same process through a mirror image of statistical analysis creates an inherently circular and error-prone system. Misjudgments are not just possible — they’re routine.

Toward a Better Approach

Given these limitations, educators are calling for a more nuanced framework. Rather than drawing a hard “AI rate” red line — the 15% threshold currently common at many institutions — Cai advocates for a transparent, traceable system where students disclose their AI usage. The judgment model should be “human-machine co-adjudication”: human expert review as the primary mechanism, with AI detection playing a supporting role.

For now, students caught between unreliable detectors and high-stakes graduation requirements are left navigating a system that can punish them for the very act of trying to comply. The paradox of using AI to sound less like AI is likely to persist until the underlying detection technology catches up — if it ever can.