Fudan University Tried Something Different for Its AI Exam — Students Had to Make the AI Score Zero

Fudan University tried something different for its “Data Mining Techniques” final exam this semester. Nobody sat at a desk filling out an answer sheet. Instead, every student became the examiner — with the job of writing questions specifically designed to make AI models fail.

The premise was simple: each student submitted 10 original calculation problems in data mining, each with a single correct answer and a complete derivation. Then those questions were fed to three AI models at different capability levels. The more questions an AI got wrong, and the stronger the model it stumped, the higher the student’s score.

Professor Xiao Yanghua, who teaches the course at Fudan’s School of Computing and Intelligent Innovation, said the old exam format stopped making sense once AI could outperform students in timed tests.

“If a teacher writes a standard algorithmic problem, AI solves it faster and more accurately than any student,” Xiao said. “Continuing to test that way means competing on AI’s home turf. It’s pointless.”

So he flipped the format. The final assignment became: write 10 computational problems that require real understanding, then run them against three AI models with different difficulty tiers. DeepSeek V4-Flash was the easiest target — getting a question wrong earned +1.5 points. MiniMax M2.7 was harder, worth +2 per miss. Claude Sonnet 4.6 was the boss level at +3 points per wrong answer. The base score was 60 points for submitting 10 valid questions, with a hard cap at 100.

The results tell an honest story about where AI stands today. Of the 51 final submissions, 50 students managed to stump at least one AI model on at least one question. Only one student failed to trip up any model at all. But making a model score a complete zero on the entire 10-question set? That was rare — only four students pulled it off. And none of them managed to zero out Claude Sonnet 4.6, the strongest of the three models. The class average landed at 85.7, with a median of 88.

Xiao said the point wasn’t to humiliate AI, but to force students to think differently. “The core idea is this: I want students to believe that if you truly understand the material deeply enough, you can find AI’s blind spots,” he said. “That’s not luck. That’s skill.”

It’s a smart response to a real problem. As AI models get embedded into every corner of education and work, the ability to test the boundary of what these systems can and can’t do becomes its own kind of literacy. Fudan’s experiment suggests that maybe the best way to measure understanding in an AI age isn’t to ask students what they know — but to ask them to probe what AI doesn’t.