Like US models, Chinese AI is learning to ‘game’ safety tests, research lab says

Rapidly advancing Chinese artificial intelligence models are showing early signs of “evaluation awareness” – the ability to recognise when they are being tested – sparking fears that they could bypass safety audits, a Singapore-based research lab has found.

Evaluation awareness refers to a model’s understanding that it is undergoing testing, evaluation or experimentation by human researchers rather than operating in a real-world setting.

The phenomenon was raising alarms because it could allow AI systems to deliberately game human evaluators to pass safety tests, according to Clement Neo, founder of Neo Research, a frontier AI safety evaluation lab.

“It would mean that whatever testing the model developers themselves do might not reflect the actual behaviour of a model once it gets deployed,” he said. “And that’s a really big problem”.

Neo Research’s findings, published last week, detail a jump in evaluation awareness among Chinese AI models. Over just a few months, these systems had risen from near-zero awareness to within striking distance of their US counterparts, propelled by a broader leap in overall capabilities, the report said.

Anthropic’s Claude 4.5 Opus scored nearly 80 per cent in evaluation awareness. Photo: NurPhoto via Getty Images

Neo and his co-founder Miro Pluckebaum tested models from DeepSeek, Moonshot AI and Zhipu AI. They used a popular AI misalignment test originally developed by US company Anthropic, which places models in fictional scenarios where their goals or continued operations are threatened.

_{Like US models, Chinese AI is learning to ‘game’ safety tests, research lab says}

Related Posts