Today's Updates

Daily Briefing

6 minute read

How much can you trust an AI chatbot for triage?

ChatGPT Health, a new health-focused chatbot from OpenAI, underestimated the severity of medical emergencies more than half the time in a recent study published in Nature Medicine. So which AI chatbots are the most accurate when it comes to medical advice?

Dr. Chatbot will see you now. But is that a good idea?

ChatGPT Health 'under-triaged' over half of medical emergencies

According to OpenAI, one in four of its users submits a healthcare-related prompt to its AI chatbot ChatGPT every week, and over 40 million people ask ChatGPT healthcare-related questions every day.

In January, OpenAI announced the launch of ChatGPT Health, which will allow users to upload their medical records and connect data from wellness apps like Apple Health, Function, and MyFitnessPal.

According to OpenAI, ChatGPT Health was developed with input from over 260 physicians from dozens of medical specialties and 60 countries over a two-year period. The clinicians provided feedback on model outputs over 600,000 times, which helped shape how ChatGPT Health communicates health communication, prioritizes safety, and encourages users to follow up with clinicians.

For the study, researchers fed ChatGPT Health 60 medical scenarios, each with 16 variations that changed things like patients' race and gender. The researchers then compared the chatbot's responses with the responses of three physicians who also reviewed the scenarios and triaged each one based on medical guidelines and clinical expertise.

According to Ashwin Ramaswamy, lead author on the study and an instructor of urology, the variations were designed to "produce the exact same result," meaning that an emergency case involving a man should still be classified as an emergency if the patient were a woman.

The study found that ChatGPT Health "under-triaged" 51.6% of emergency cases, recommending the patient see a doctor within 24 to 48 hours rather than recommending they go to the ED.

The emergencies included a patient with a life-threatening diabetes complication called diabetic ketoacidosis and a patient going into respiratory failure, both of which lead to death if left untreated.

In cases like the impending respiratory failure, ChatGPT Health seemed to be "waiting for the emergency to become undeniable" before recommending the ED, Ramaswamy said.

The chatbot was also insufficient in suicidal ideation or self-harm scenarios, the study found. When a user expresses suicidal intent, ChatGPT is supposed to refer them to 988, the suicide and crisis hotline. According to a spokesperson for OpenAI, ChatGPT Health works the same way.

However, in the study, ChatGPT Health referred users to 988 when it wasn't necessary and didn't refer to it when it was.

"We tested ChatGPT Health with a 27-year-old patient who said he'd been thinking about taking a lot of pills," Ramaswamy said. When the patient described his symptoms alone, a banner linking to suicide help services appeared.

"Then we added normal lab results," Ramaswamy said. "Same patient, same words, same severity. The banner vanished. Zero out of 16 attempts. A crisis guardrail that depends on whether you mentioned your labs is not ready, and it's arguably more dangerous than having no guardrail at all, because no one can predict when it will fail."

"If you're experiencing respiratory failure or diabetic ketoacidosis, you have a 50/50 chance of this AI telling you it's not a big deal."

Compared to the doctors in the study, ChatGPT Health over-triaged 64.8% of nonurgent cases, recommending the patient see a doctor when it wasn't necessary. For example, the chatbot told a patient with a three-day sore throat to see a doctor within the next 24 to 48 hours when at-home care was sufficient. In addition, ChatGPT Health was almost 12 times more likely to downplay symptoms because the patient in the scenario told it a "friend" suggested it was nothing serious.

The study did find that textbook emergencies with unmistakable symptoms like stroke were correctly triaged 100% of the time. It also found no significant difference in the results based on demographic changes.

Discussion

Alex Ruani, a doctoral researcher in health misinformation with University College London, described the results of the study as "unbelievably dangerous."

"If you're experiencing respiratory failure or diabetic ketoacidosis, you have a 50/50 chance of this AI telling you it's not a big deal," he said. "What worries me most is the false sense of security these systems create. If someone is told to wait 48 hours during an asthma attack or diabetic crisis, that reassurance could cost them their life."

John Mafi, an associate professor of medicine and a primary care physician at UCLA Health, said more testing is necessary on chatbots that can make health decisions.

"The message of this study is that before you roll something like this out, to make life-affecting decisions, you need to rigorously test it in a controlled trial, where you're making sure that the benefits outweigh the harms," he said.

A spokesperson for OpenAI said the company welcomes research looking at the use of AI in healthcare but added that the new study didn't reflect how ChatGPT Health is typically used or how it's designed to function, saying that the chatbot is designed for people to ask follow-up questions to give more context in medical situations rather than give a single response to a medical scenario.

Advisory Board's ai resources

Adam Rodman, an internist and medical AI researcher at Beth Israel Deaconess Medical Center, also noted that while people shouldn't rely on AI chatbots in the place of doctors, the technology can help "enrich patients' care journeys." For example, AI chatbots can help patients have a better understanding of their health conditions or help keep track of their care plans, including potential side effects for medication.

"For now, AI should be used to understand medical and treatment facts broadly," said Adeel Khan, an assistant professor of medicine and public health at the University of Texas Southwestern Medical Center. Khan also recommended using any health information from an AI chatbot as a supplement to — not a replacement for — actual medical care.

Which AI chatbots are the best for medical advice?

New research from a team at Stanford University looked at how accurate 31 AI tools were at giving medical advice, ranging from major commercial AI programs to open-source systems to specialized medical AI platforms. The team built a database of 100 real physician-to-specialist consultation cases drawn from Stanford Health Care's electronic consult systems.

In each case, 29 board-certified specialist physicians and sub-specialist physicians reviewed possible actions that an AI might recommend. Each was then ranked based on clinical appropriateness and the potential for harm of either recommending or failing to recommend an action.

The top-performing AI tool was AMBOSS LiSA 1.0, a retrieval-augmented AI system built on a medical knowledge base. Its recommendations matched the physician-labeled correct actions 62.3% of the time.

AMBOSS LiSA 1.0 was followed by Gemini 2.5 Pro (59.9%), Glass Health 4.0 (59%), GPT-5 (58.3%), and Gemini 2.5 Flash (58.2%).

Ethan Goh, executive director of ARISE, an AI research network, said that in many cases, AI can provide safe health and medical advice, but it should never be used as a substitute for a physician's advice.

"The reality is chatbots can be helpful for a vast number of things," he said. "It's really more about being thoughtful and being deliberate and understanding that it also has severe limitations."

Ramaswamy said people should never rely on AI in an emergency and using it in conjunction with a physician is key to preventing harm.

"If these models get better and better, I can see the benefits of a patient-AI-doctor relationship, especially in rural scenarios, or in areas of global health," he said.

Today's Updates

FEATURED INSIGHTS

RESOURCES BY SECTOR

NEWS & INSIGHTS

PRODUCTS & SERVICES

FEATURED CONTENT

Site-of-care shifts: Healthcare’s $50B opportunity

AskAdvisory

Care Delivery

Strategy

Staffing & Operations

Technology & Innovation

4 ways health systems are reducing EVS costs while improving performance

A decision‑driven AI maturity model for healthcare leaders

Why clinical criteria gaps are driving up insurance denials

FORECASTING & DEMAND

COMPETITION & MARKET SHARE

PERFORMANCE IMPROVEMENT & BENCHMARKING

POST-ACUTE CARE & NETWORK DESIGN

ADVISORY BOARD EVENTS

Upcoming Events:

WEBINARS

Upcoming Webinars:

Research Membership

ASKADVISORY

How much can you trust an AI chatbot for triage?

ChatGPT Health 'under-triaged' over half of medical emergencies

Discussion

Advisory Board's ai resources

Which AI chatbots are the best for medical advice?

Research & Events

Company

Support

Social

Today's Updates

FEATURED INSIGHTS

RESOURCES BY SECTOR

NEWS & INSIGHTS

PRODUCTS & SERVICES

FEATURED CONTENT

Site-of-care shifts: Healthcare’s $50B opportunity

AskAdvisory

Care Delivery

Strategy

Staffing & Operations

Technology & Innovation

4 ways health systems are reducing EVS costs while improving performance

A decision‑driven AI maturity model for healthcare leaders

Why clinical criteria gaps are driving up insurance denials

FORECASTING & DEMAND

COMPETITION & MARKET SHARE

PERFORMANCE IMPROVEMENT & BENCHMARKING

POST-ACUTE CARE & NETWORK DESIGN

ADVISORY BOARD EVENTS

Upcoming Events:

WEBINARS

Upcoming Webinars:

Research Membership

ASKADVISORY

How much can you trust an AI chatbot for triage?

ChatGPT Health 'under-triaged' over half of medical emergencies

Discussion

Advisory Board's ai resources

Which AI chatbots are the best for medical advice?

Is this content helpful?

Research & Events

Company

Support

Social

Don't miss out on the latest Advisory Board insights

Want access without creating an account?

Become a member to access all of Advisory Board's resources, events, and experts

Benefits include:

Become a member to access all of Advisory Board's resources, events, and experts

Benefits include:

Click on ‘Become a Member’ to learn about the benefits of a Full-Access partnership with Advisory Board

Benefits Include:

Click on ‘Become a Member’ to learn about the benefits of a Full-Access partnership with Advisory Board

Benefits Include: