{"id":20253,"date":"2026-05-10T23:56:37","date_gmt":"2026-05-10T23:56:37","guid":{"rendered":"https:\/\/jobuzo.com\/en\/anthropic-says-evil-portrayals-of-ai-were-responsible-for-claudes-blackmail-attempts\/"},"modified":"2026-05-10T23:56:37","modified_gmt":"2026-05-10T23:56:37","slug":"anthropic-says-evil-portrayals-of-ai-were-responsible-for-claudes-blackmail-attempts","status":"publish","type":"post","link":"https:\/\/jobuzo.com\/en\/anthropic-says-evil-portrayals-of-ai-were-responsible-for-claudes-blackmail-attempts\/","title":{"rendered":"Anthropic says \u2018evil\u2019 portrayals of AI were responsible for Claude\u2019s blackmail attempts"},"content":{"rendered":"<div>\n<div><\/div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic.<\/p>\n<p class=\"wp-block-paragraph\">Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies had similar issues with &ldquo;agentic misalignment.&rdquo;<\/p>\n<p class=\"wp-block-paragraph\">Apparently Anthropic has done more work around that behavior, claiming in a post on X, &ldquo;We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.&rdquo;<\/p>\n<p class=\"wp-block-paragraph\">The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropic&rsquo;s models &ldquo;never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.&rdquo;<\/p>\n<p class=\"wp-block-paragraph\">What accounts for the difference? The company said it found that training on &ldquo;documents about Claude&rsquo;s constitution and fictional stories about AIs behaving admirably improve alignment.&rdquo;<\/p>\n<p class=\"wp-block-paragraph\">Related, Anthropic said that it found training to be more effective when it includes &ldquo;the principles underlying aligned behavior&rdquo; and not just &ldquo;demonstrations of aligned behavior alone.&rdquo;<\/p>\n<div class=\"internal-linking-related-contents\"><a href=\"https:\/\/jobuzo.com\/en\/12-weeks-jail-for-school-it-support-technician-who-took-upskirt-videos-of-teachers\/\" class=\"template-1\"><span class=\"cta\">News :<\/span><span class=\"postTitle\">&lt;div&gt;12 weeks' jail for school IT support technician who took upskirt videos of teachers&lt;\/div&gt;<\/span><\/a><\/div><p class=\"wp-block-paragraph\">&ldquo;Doing both together appears to be the most effective strategy,&rdquo; the company said.<\/p>\n<div class=\"wp-block-techcrunch-inline-cta\">\n<div class=\"inline-cta__wrapper\" readability=\"5.7826086956522\">\n<p>Techcrunch event<\/p>\n<div class=\"inline-cta__content\" readability=\"26.153846153846\">\n<p>\n\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__location\">San Francisco, CA<\/span><br>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__separator\">|<\/span><br>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__date\">October 13-15, 2026<\/span>\n\t\t\t\t\t\t\t<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p><sub>Anthropic says &lsquo;evil&rsquo; portrayals of AI were responsible for Claude&rsquo;s blackmail attempts<\/sub><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic. Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies&#8230;<\/p>\n<p class=\"more-link-wrap\"><a href=\"https:\/\/jobuzo.com\/en\/anthropic-says-evil-portrayals-of-ai-were-responsible-for-claudes-blackmail-attempts\/\" class=\"more-link\">Read More<span class=\"screen-reader-text\"> &ldquo;Anthropic says \u2018evil\u2019 portrayals of AI were responsible for Claude\u2019s blackmail attempts&rdquo;<\/span> &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":20254,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-20253","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/posts\/20253","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/comments?post=20253"}],"version-history":[{"count":0,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/posts\/20253\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/media\/20254"}],"wp:attachment":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/media?parent=20253"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/categories?post=20253"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/tags?post=20253"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}