Skip to content

JOBUZO

  • News
  • Indonesia
  • Toggle search form
Are bad incentives to blame for AI hallucinations?

Are bad incentives to blame for AI hallucinations?

Posted on 8 September 2025 By jobuzo

A new research paper from OpenAI asks why large language models like GPT-5 and chatbots like ChatGPT still hallucinate, and whether anything can be done to reduce those hallucinations.

In a blog post summarizing the paper, OpenAI defines hallucinations as “plausible but false statements generated by language models,” and it acknowledges that despite improvements, hallucinations “remain a fundamental challenge for all large language models” — one that will never be completely eliminated.

To illustrate the point, researchers say that when they asked “a widely used chatbot” about the title of Adam Tauman Kalai’s Ph.D. dissertation, they got three different answers, all of them wrong. (Kalai is one of the paper’s authors.) They then asked about his birthday and received three different dates. Once again, all of them were wrong.

How can a chatbot be so wrong — and sound so confident in its wrongness? The researchers suggest that hallucinations arise, in part, because of a pretraining process that focuses on getting models to correctly predict the next word, without true or false labels attached to the training statements: “The model sees only positive examples of fluent language and must approximate the overall distribution.”

“Spelling and parentheses follow consistent patterns, so errors there disappear with scale,” they write. “But arbitrary low-frequency facts, like a pet’s birthday, cannot be predicted from patterns alone and hence lead to hallucinations.”

The paper’s proposed solution, however, focuses less on the initial pretraining process and more on how large language models are evaluated. It argues that the current evaluation models don’t cause hallucinations themselves, but they “set the wrong incentives.”

News :<div>12 weeks' jail for school IT support technician who took upskirt videos of teachers</div>

The researchers compare these evaluations to the kind of multiple choice tests random guessing makes sense, because “you might get lucky and be right,” while leaving the answer blank “guarantees a zero.” 

Techcrunch event

San Francisco
|
October 27-29, 2025

“In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say ‘I don’t know,’” they say.

The proposed solution, then, is similar to tests (like the SAT) that include “negative [scoring] for wrong answers or partial credit for leaving questions blank to discourage blind guessing.” Similarly, OpenAI says model evaluations need to “penalize confident errors more than you penalize uncertainty, and give partial credit for appropriate expressions of uncertainty.”

And the researchers argue that it’s not enough to introduce “a few new uncertainty-aware tests on the side.” Instead, “the widely used, accuracy-based evals need to be updated so that their scoring discourages guessing.”

News :Migrant acquitted in first trial over US border military zones

“If the main scoreboards keep rewarding lucky guesses, models will keep learning to guess,” the researchers say.

Are bad incentives to blame for AI hallucinations?


News

Post navigation

Previous Post: Japanese PM Ishiba announces resignation as LDP head
Next Post: Microsoft’s cloud service restored after reports of cut cables in the Red Sea

Related Posts

YouTube Premium adds high-quality audio and 4x playback for iOS, Android and desktop YouTube Premium adds high-quality audio and 4x playback for iOS, Android and desktop News
NeoLogic wants to build more energy-efficient CPUs for AI data centers NeoLogic wants to build more energy-efficient CPUs for AI data centers News
Tesla offers unprecedented  trillion pay package to Musk Tesla offers unprecedented $1 trillion pay package to Musk News

Latest

  • When DWTS’ Alan Bersten Realized He, Emma Slater Could Be More Than Friends
  • OpenAI unveils Lockdown Mode to protect sensitive data from prompt injection attacks
  • What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates
  • U.S. job market posts surprising increase in May, but prospects unclear amid price hikes
  • ‘World crying for peace’: Pope Leo kicks off Spain trip with fiery plea to leaders
  • Drone strike on central Sudan market kills 11: rights group
  • U.S. attacks Iranian sites after Iran launches drones, in latest Gulf flare-up
  • Baby killed in West Bank after Israeli troops open fire on a car, Palestinian health officials say
  • West Ham joint-chairman quits ahead of ‘historic allegations’ to be made against him
  • Sherpa believed to be dead crawls back to Everest Base Camp after nearly a week missing

Copyright © 2025 JOBUZO. Disclaimers | Privacy Policies

Powered by PressBook Masonry Blogs