{"id":19941,"date":"2026-05-03T23:55:32","date_gmt":"2026-05-03T23:55:32","guid":{"rendered":"https:\/\/jobuzo.com\/en\/in-harvard-study-ai-offered-more-accurate-emergency-room-diagnoses-than-two-human-doctors\/"},"modified":"2026-05-03T23:55:32","modified_gmt":"2026-05-03T23:55:32","slug":"in-harvard-study-ai-offered-more-accurate-emergency-room-diagnoses-than-two-human-doctors","status":"publish","type":"post","link":"https:\/\/jobuzo.com\/en\/in-harvard-study-ai-offered-more-accurate-emergency-room-diagnoses-than-two-human-doctors\/","title":{"rendered":"In Harvard study, AI offered more accurate emergency room diagnoses than two human doctors"},"content":{"rendered":"<div>\n<div><\/div>\n<div readability=\"120.0174286436\">\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases &mdash; where at least one model seemed to be more accurate than human doctors.<\/p>\n<p class=\"wp-block-paragraph\">The study was published this week in Science and comes from a research team led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. The researchers said they conducted a variety of experiments to measure how OpenAI&rsquo;s models compared to human physicians.<\/p>\n<p class=\"wp-block-paragraph\">In one experiment, researchers focused on 76 patients who came into the Beth Israel emergency room, comparing the diagnoses offered by two internal medicine attending physicians to those generated by OpenAI&rsquo;s o1 and 4o models. These diagnoses were assessed by two other attending physicians, who did not know which ones came from humans and which came from AI.<\/p>\n<p class=\"wp-block-paragraph\">&ldquo;At each diagnostic touchpoint, o1 either performed nominally better than or on par with the two attending physicians and 4o,&rdquo; the study said, adding that the differences &ldquo;were especially pronounced at the first diagnostic touchpoint (initial ER triage), where there is the least information available about the patient and the most urgency to make the correct decision.&rdquo;<\/p>\n<p class=\"wp-block-paragraph\">In Harvard Medical School&rsquo;s press release about the study, the researchers emphasized that they did not &ldquo;pre-process the data at all&rdquo; &mdash; the AI models were presented with the same information that was available in the electronic medical records at the time of each diagnosis.&nbsp;<\/p>\n<p class=\"wp-block-paragraph\">With that information, the o1 model managed to offer &ldquo;the exact or very close diagnosis&rdquo; in 67% of triage cases, compared to one physician who had the exact or close diagnosis 55% of the time, and to the other who hit the mark 50% of the time.<\/p>\n<div class=\"internal-linking-related-contents\"><a href=\"https:\/\/jobuzo.com\/en\/12-weeks-jail-for-school-it-support-technician-who-took-upskirt-videos-of-teachers\/\" class=\"template-1\"><span class=\"cta\">News :<\/span><span class=\"postTitle\">&lt;div&gt;12 weeks' jail for school IT support technician who took upskirt videos of teachers&lt;\/div&gt;<\/span><\/a><\/div><p class=\"wp-block-paragraph\">&ldquo;We tested the AI model against virtually every benchmark, and it eclipsed both prior models and our physician baselines,&rdquo; said Arjun Manrai, who heads an AI lab at Harvard Medical School and is one of the study&rsquo;s lead authors, in the press release.<\/p>\n<div class=\"wp-block-techcrunch-inline-cta\">\n<div class=\"inline-cta__wrapper\" readability=\"5.7826086956522\">\n<p>Techcrunch event<\/p>\n<div class=\"inline-cta__content\" readability=\"26.153846153846\">\n<p>\n\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__location\">San Francisco, CA<\/span><br>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__separator\">|<\/span><br>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__date\">October 13-15, 2026<\/span>\n\t\t\t\t\t\t\t<\/p>\n<\/div>\n<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">To be clear, the study didn&rsquo;t claim that AI is ready to make real life-or-death decisions in the emergency room. Instead, it said the findings show an &ldquo;urgent need for prospective trials to evaluate these technologies in real-world patient care settings.&rdquo;<\/p>\n<p class=\"wp-block-paragraph\">The researchers also noted that they only studied how models performed when provided with text-based information, and that &ldquo;existing studies suggest that current foundation models are more limited in reasoning over nontext inputs.&rdquo;<\/p>\n<p class=\"wp-block-paragraph\">Adam Rodman, a Beth Israel doctor who&rsquo;s also one of the study&rsquo;s lead authors, warned the Guardian that there&rsquo;s &ldquo;no formal framework right now for accountability&rdquo; around AI diagnoses, and that patients still &ldquo;want humans to guide them through life or death decisions [and] to guide them through challenging treatment decisions.&rdquo;<\/p>\n<div class=\"internal-linking-related-contents\"><a href=\"https:\/\/jobuzo.com\/en\/migrant-acquitted-in-first-trial-over-us-border-military-zones\/\" class=\"template-1\"><span class=\"cta\">News :<\/span><span class=\"postTitle\">Migrant acquitted in first trial over US border military zones<\/span><\/a><\/div><p class=\"wp-block-paragraph\">In a post about the study, Kristen Panthagani, an emergency physician, said this is an &ldquo;an interesting AI study that has led to some very overhyped headlines,&rdquo; especially since it was comparing AI diagnoses to those from internal medicine physicians, not ER physicians.<\/p>\n<p class=\"wp-block-paragraph\">&ldquo;If we&rsquo;re going to compare AI tools to physicians&rsquo; clinical ability, we should start by comparing to physicians who actually practice that specialty,&rdquo; Panthagani said. &ldquo;I would not be surprised if a LLM could beat a dermatologist at an neurosurgery board exam, [but] that&rsquo;s not a particularly helpful thing to know.&rdquo;<\/p>\n<p class=\"wp-block-paragraph\">She also argued, &ldquo;As an ER doctor seeing a patient for a first time, my primary goal is <em>not<\/em> to guess your ultimate diagnosis. My primary goal is to determine if you have a condition that could kill you.&rdquo;<\/p>\n<p class=\"wp-block-paragraph\"><em>This post and headline have been updated to reflect the fact that the diagnoses in the study came from internal medicine attending physicians, and to include commentary from Kristen Panthagani.<\/em><\/p>\n<\/div>\n<p><em>When you purchase through links in our articles, we may earn a small commission. This doesn&rsquo;t affect our editorial independence.<\/em><\/p>\n<\/div>\n<p><sub>In Harvard study, AI offered more accurate emergency room diagnoses than two human doctors<\/sub><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases &mdash; where at least one model seemed to be more accurate than human doctors. The study was published this week in Science and comes from a research team led by physicians and computer scientists at&#8230;<\/p>\n<p class=\"more-link-wrap\"><a href=\"https:\/\/jobuzo.com\/en\/in-harvard-study-ai-offered-more-accurate-emergency-room-diagnoses-than-two-human-doctors\/\" class=\"more-link\">Read More<span class=\"screen-reader-text\"> &ldquo;In Harvard study, AI offered more accurate emergency room diagnoses than two human doctors&rdquo;<\/span> &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":19942,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-19941","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/posts\/19941","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/comments?post=19941"}],"version-history":[{"count":0,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/posts\/19941\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/media\/19942"}],"wp:attachment":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/media?parent=19941"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/categories?post=19941"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/tags?post=19941"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}