Two artificial intelligence (AI) language models passed the U.S. Medical Licensing Exam (USMLE), which medical students often spend hundreds of hours studying for. But even with this development, experts say the technology is still unlikely to replace humans in medical care soon.
ChatGPT is a "generative AI" program developed by OpenAI that can produce text, images, audio, and videos based on prompts. There is growing interest in its potential use in the healthcare industry.
For example, Ansible Health began to experiment with ChatGPT in its day-to-day tasks, finding that the program could help draft payment notices, simplify radiology reports, and come up with potential answers for "diagnostically challenging tasks."
"There was so much excitement in the tech world when ChatGPT came out, so we wanted to see if it was just hype or useful," said Jack Po, Ansible's CEO and a former product manager at Google.
To test ChatGPT's clinical reasoning abilities, researchers at Ansible had the program take the USMLE, which consists of three separate exams. Typically, Part 1 is taken by second-year medical students, Part 2 by fourth-year students, and Part 3 by physicians a year after graduation.
For the test, researchers inputted questions to ChatGPT from previous USMLE exams, including those that required open-ended responses. The questions were then independently scored by two physician adjudicators.
Overall, ChatGPT scored more than 50% across all three exams and even surpassed the typical USMLE passing threshold of 60% in most of the researchers' analyses.
"ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement," the researchers wrote. "These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making."
In a separate study, Vivek Natarajan, an AI researcher, and his colleagues also tested how well Flan-PaLM, another large AI language model, performed on the USMLE. Unlike ChatGPT, Flan-PaLm was prepped for the exam using MultiMedQA, a collection of medical question-answering databases.
Flan-PaLM scored 67.6% on the USLME questions, which was roughly 17 percentage points higher than the previous best performance by an AI program, PubMed GPT.
Large language models, like ChatGPT or Flan-PaLM, "present a significant opportunity to rethink the development of medical AI and make it easier, safer, and more equitable to use," wrote Natarajan and his colleagues.
Currently, AI programs are still in the early development stages, and they are unlikely to replace humans when it comes to diagnosing or caring for patients anytime soon. According to Axios, one significant hurdle to using AI for diagnoses is its potential to assert false results, which could be dangerous to patients if not checked by an actual provider.
"I think we're in the middle of a 20-year arc, kind of like what we already saw with finance," said Vijay Pande, a healthcare investor with Andreessen Horowitz and adjunct professor of bioengineering at Stanford University. "In 2000, it was insane to think that a computer could beat a master trader on Wall Street. Today, it's insane to think that master trader could beat a computer."
Over time, AI programs could advance far enough to be used for wellness checks and other general practitioner tasks. They could also eventually include other data like vocal tone, body language, and facial expressions to determine a patient's condition.
Although researchers have acknowledged the current limitations of AI, Natarajan and his colleagues wrote that they hoped their findings would "spark further conversations and collaborations between patients, consumers, AI researchers, clinicians, social scientists, ethicists, policymakers and other interested people in order to responsibly translate these early research findings to improve healthcare." (DePeau-Wilson, MedPage Today, 1/19; Primack, Axios, 1/18; Purtill, ABC Australia, 1/11)
Create your free account to access 2 resources each month, including the latest research and webinars.
You have 2 free members-only resources remaining this month remaining this month.
Never miss out on the latest innovative health care content tailored to you.