{"id":6738,"date":"2025-09-08T00:03:29","date_gmt":"2025-09-08T00:03:29","guid":{"rendered":"https:\/\/jobuzo.com\/en\/are-bad-incentives-to-blame-for-ai-hallucinations\/"},"modified":"2025-09-08T00:03:29","modified_gmt":"2025-09-08T00:03:29","slug":"are-bad-incentives-to-blame-for-ai-hallucinations","status":"publish","type":"post","link":"https:\/\/jobuzo.com\/en\/are-bad-incentives-to-blame-for-ai-hallucinations\/","title":{"rendered":"Are bad incentives to blame for AI hallucinations?"},"content":{"rendered":"<div>\n<div><\/div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">A new research paper from OpenAI asks why large language models like GPT-5 and chatbots like ChatGPT still hallucinate, and whether anything can be done to reduce those hallucinations.<\/p>\n<p class=\"wp-block-paragraph\">In a blog post summarizing the paper, OpenAI defines hallucinations as &ldquo;plausible but false statements generated by language models,&rdquo; and it acknowledges that despite improvements, hallucinations &ldquo;remain a fundamental challenge for all large language models&rdquo; &mdash; one that will never be completely eliminated.<\/p>\n<p class=\"wp-block-paragraph\">To illustrate the point, researchers say that when they asked &ldquo;a widely used chatbot&rdquo; about the title of Adam Tauman Kalai&rsquo;s Ph.D. dissertation, they got three different answers, all of them wrong. (Kalai is one of the paper&rsquo;s authors.) They then asked about his birthday and received three different dates. Once again, all of them were wrong.<\/p>\n<p class=\"wp-block-paragraph\">How can a chatbot be so wrong &mdash; and sound so confident in its wrongness? The researchers suggest that hallucinations arise, in part, because of a pretraining process that focuses on getting models to correctly predict the next word, without true or false labels attached to the training statements: &ldquo;The model sees only positive examples of fluent language and must approximate the overall distribution.&rdquo;<\/p>\n<p class=\"wp-block-paragraph\">&ldquo;Spelling and parentheses follow consistent patterns, so errors there disappear with scale,&rdquo; they write. &ldquo;But arbitrary low-frequency facts, like a pet&rsquo;s birthday, cannot be predicted from patterns alone and hence lead to hallucinations.&rdquo;<\/p>\n<p class=\"wp-block-paragraph\">The paper&rsquo;s proposed solution, however, focuses less on the initial pretraining process and more on how large language models are evaluated. It argues that the current evaluation models don&rsquo;t cause hallucinations themselves, but they &ldquo;set the wrong incentives.&rdquo;<\/p>\n<div class=\"internal-linking-related-contents\"><a href=\"https:\/\/jobuzo.com\/en\/12-weeks-jail-for-school-it-support-technician-who-took-upskirt-videos-of-teachers\/\" class=\"template-1\"><span class=\"cta\">News :<\/span><span class=\"postTitle\">&lt;div&gt;12 weeks' jail for school IT support technician who took upskirt videos of teachers&lt;\/div&gt;<\/span><\/a><\/div><p class=\"wp-block-paragraph\">The researchers compare these evaluations to the kind of multiple choice tests random guessing makes sense, because &ldquo;you might get lucky and be right,&rdquo; while leaving the answer blank &ldquo;guarantees a zero.&rdquo;&nbsp;<\/p>\n<div class=\"wp-block-techcrunch-inline-cta\">\n<div class=\"inline-cta__wrapper\" readability=\"5.3\">\n<p>Techcrunch event<\/p>\n<div class=\"inline-cta__content\" readability=\"24.75\">\n<p>\n\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__location\">San Francisco<\/span><br>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__separator\">|<\/span><br>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__date\">October 27-29, 2025<\/span>\n\t\t\t\t\t\t\t<\/p>\n<\/div>\n<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">&ldquo;In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say &lsquo;I don&rsquo;t know,&rsquo;&rdquo; they say.<\/p>\n<p class=\"wp-block-paragraph\">The proposed solution, then, is similar to tests (like the SAT) that include &ldquo;negative [scoring] for wrong answers or partial credit for leaving questions blank to discourage blind guessing.&rdquo; Similarly, OpenAI says model evaluations need to &ldquo;penalize confident errors more than you penalize uncertainty, and give partial credit for appropriate expressions of uncertainty.&rdquo;<\/p>\n<p class=\"wp-block-paragraph\">And the researchers argue that it&rsquo;s not enough to introduce &ldquo;a few new uncertainty-aware tests on the side.&rdquo; Instead, &ldquo;the widely used, accuracy-based evals need to be updated so that their scoring discourages guessing.&rdquo;<\/p>\n<div class=\"internal-linking-related-contents\"><a href=\"https:\/\/jobuzo.com\/en\/migrant-acquitted-in-first-trial-over-us-border-military-zones\/\" class=\"template-1\"><span class=\"cta\">News :<\/span><span class=\"postTitle\">Migrant acquitted in first trial over US border military zones<\/span><\/a><\/div><p class=\"wp-block-paragraph\">&ldquo;If the main scoreboards keep rewarding lucky guesses, models will keep learning to guess,&rdquo; the researchers say.<\/p>\n<\/div>\n<p><sub>Are bad incentives to blame for AI hallucinations?<\/sub><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A new research paper from OpenAI asks why large language models like GPT-5 and chatbots like ChatGPT still hallucinate, and whether anything can be done to reduce those hallucinations. In a blog post summarizing the paper, OpenAI defines hallucinations as &ldquo;plausible but false statements generated by language models,&rdquo; and it acknowledges that despite improvements, hallucinations&#8230;<\/p>\n<p class=\"more-link-wrap\"><a href=\"https:\/\/jobuzo.com\/en\/are-bad-incentives-to-blame-for-ai-hallucinations\/\" class=\"more-link\">Read More<span class=\"screen-reader-text\"> &ldquo;Are bad incentives to blame for AI hallucinations?&rdquo;<\/span> &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":6739,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-6738","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/posts\/6738","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/comments?post=6738"}],"version-history":[{"count":0,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/posts\/6738\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/media\/6739"}],"wp:attachment":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/media?parent=6738"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/categories?post=6738"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/tags?post=6738"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}