Auto logout in seconds.
Continue LogoutAs healthcare grows costlier, more people are using AI chatbots for health advice. However, new research suggests that these chatbots can vary significantly in accuracy, leading health experts to say that "AI just isn't ready to take on the role of the physician."
According to data from OpenAI, more than 40 million people ask the company's AI chatbot ChatGPT health-related questions every day. In the United States, three in five adults reported having used AI tools for their health or healthcare in the past three months.
As more people rely on AI chatbots for health advice, there is growing concern about AI's accuracy and how its use could impact patients' health.
In a new study published in Nature Medicine, researchers tested how well AI chatbots could help participants identify underlying health conditions and choose an appropriate course of action in 10 medical scenarios. The researchers randomly assigned 1,298 participants to either receive assistance from an AI chatbot or a source of their choice (control group). The AI chatbots evaluated in the study were OpenAI's ChatGPT-4o, Meta's Llama 3, and Cohere's Command R+.
When tested on their own, the chatbots were able to identify the correct conditions in 94.9% of cases and choose the correct course of action in an average of 56.3% of cases. However, when used by participants in the study, they identified the correct conditions in less than 34.5% of cases and the right course of action in less than 44.2%.
Overall, the researchers found that participants using AI chatbots performed no better than those who were told they could use any research method, which was mainly performing online searches.
"Despite all the hype, AI just isn't ready to take on the role of the physician."
After analyzing around 30 of the participants' AI interactions in detail, they determined that user errors contributed to mistakes around half of the time. In these cases, participants either didn't enter enough information or did not provide the most relevant symptoms.
However, even when the researchers typed in medical scenarios directly, the chatbots had difficulty determining when symptoms required immediate medical attention or just nonurgent care. There were also several occasions when the chatbots fabricated information, such as the number of emergency hotlines, or changed its answers significantly based on just slight variations in a question.
For example, a participant saying they had a "terrible headache," along with a stiff neck, led to the chatbot treating the problem as a minor health issue. However, when a participant said they had "the worst headache ever" and a stiff neck, the chatbot recommended they go to the ED or call emergency services.
"Very, very small words make very big differences," said Andrew Bean, a graduate student at the University of Oxford and the study's lead author.
According to the researchers, the study's findings show that none of the AI chatbots evaluated were "ready for deployment in direct patient care."
"[D]espite all the hype, AI just isn't ready to take on the role of the physician," said Rebecca Payne, a physician and one of the study's authors from the University of Oxford. "Patients need to be aware that asking a large language model about their symptoms can be dangerous, giving wrong diagnoses and failing to recognize when urgent help is needed."
Adam Mahdi, a professor at the Oxford Internet Institute and the study's senior author, said the study highlighted the "huge gap" between AI's potential and the pitfalls when it's used by humans. "The knowledge may be in those bots; however, this knowledge doesn't always translate when interacting with humans," he said.
For more insights in AI and healthcare, check out these Advisory Board resources:
According to Robert Wachter, chair of the department of medicine at the University of California, San Francisco, "[t]here's a lot of cognitive magic and experience that goes into figuring out what elements of the case are important that you feed into the bot." While doctors are trained to recognize which details are relevant and which can be ignored, the general public is not, which makes it more difficult for them to use AI chatbots for health advice effectively.
However, instead of relying on people to create the perfect question for chatbots, Bean said chatbots should be programmed to ask follow-up questions similar to the ways doctors gather information from patients when making a diagnosis.
"Is it really the user's responsibility to know which symptoms to highlight, or is it partly the model's responsibility to know what to ask?" Bean asked.
Currently, the researchers are planning to conduct similar studies in different countries and languages and want to see if these changes will impact AI's performance over time. "We hope this work will contribute to the development of safer and more useful AI systems," Bean said.
(Rosenbluth, New York Times, 2/9; Gudge, BBC, 2/9; Rigby, Reuters, 2/9; Bean, et al., Nature Medicine, 2/9).
By Carol Chouinard and Ben Isenhour
The growing prevalence of AI chatbots in healthcare
Over the last few years, some patients have lost trust in the health system. Because of this, more people are turning to alternative sources for healthcare guidance, including AI chatbots like ChatGPT and their own online research.
As technology improves, patients are gaining access to more methods for gathering medical advice, which is changing the way they engage with the health system. According to several provider surveys, patients are now coming to their providers with more specific questions and expectations.
AI chatbots will continue to change how patients interact with their own healthcare, providing them with a more interactive and personalized experience. Although most AI chatbots generally rely on publicly available data, there are ongoing efforts to integrate patients' medical records and other personal information to better inform the chatbots' responses. For example, OpenAI recently released a new health-focused feature called ChatGPT Health, which allows users to upload their medical records and connect data from wellness apps.
At the same time, there have been growing policy efforts to give patients more access and control over their health data, both in the United States and abroad. Currently, several technology companies, including Epic, Google, OpenAI, Anthropic, and Microsoft, are expanding their healthcare capabilities with the help of AI.
All of these changes theoretically allow consumers to gain more control over their health data and use it to improve their health and other aspects of their lives. Going forward, providers will need to address this shift by finding new ways or channels to influence healthcare behaviors and support care management.
The shortcomings of AI chatbots in healthcare
Despite widespread interest in AI chatbots and other tools, it's important to not overlook their risks. Many AI tools are prone to faulty or misleading information, which can negatively impact patients' health.
Organizations will need to determine how to minimize the risks associated with patients using third-party AI tools and help patients use them responsibly. Organizations that deliver a clinically governed digital front door that's grounded in validated content, tuned to local care pathways, and designed to route people safely (self-care vs. primary care vs. urgent care/ED) to clinicians with clear escalation criteria will likely be the most successful.
The best use of AI in healthcare
While today's results highlight real limitations, they also reflect how early we are in the evolution of these tools. AI chatbots are improving rapidly and will likely look very different in the future. For organizations with the resources to do so, experimenting with multiple tools — and continuing to engage with them as they develop — may offer more value than depending on any one platform today.
From Optum's perspective, the greatest value of generative AI appears when it serves as a copilot. It should help members and patients ask better questions, understand their choices, and take appropriate next steps. It should also help clinicians reduce cognitive and administrative burden rather than serve as an autonomous clinical decision-maker.
*Advisory Board is a subsidiary of Optum. All Advisory Board research, expert perspectives, and recommendations remain independent.
Create your free account to access 1 resource, including the latest research and webinars.
You have 1 free members-only resource remaining this month.
1 free members-only resources remaining
1 free members-only resources remaining
You've reached your limit of free insights
Never miss out on the latest innovative health care content tailored to you.
You've reached your limit of free insights
Never miss out on the latest innovative health care content tailored to you.
This content is available through your Curated Research partnership with Advisory Board. Click on ‘view this resource’ to read the full piece
Email ask@advisory.com to learn more
Never miss out on the latest innovative health care content tailored to you.
This is for members only. Learn more.
Never miss out on the latest innovative health care content tailored to you.