OpenAI's GeneBench-Pro doesn't let AI models cheat on biology — and most of them fail

Thu, 02 Jul 2026 00:00:00 +0800

OpenAI released a new benchmark Tuesday called GeneBench-Pro, and it is not the kind of test you can cram for. Instead of asking an AI model to recite facts or follow a fixed procedure, it drops the model into a messy, incomplete dataset and asks it to figure out what to do.

The idea is simple: real biology research is not a multiple-choice exam. A scientist staring at a genome sequence does not have a clean prompt and four options. They have noise, gaps, and conflicting signals. GeneBench-Pro tries to measure whether AI can handle that reality.

Biology on IT News

OpenAI's GeneBench-Pro doesn't let AI models cheat on biology — and most of them fail