IBM's Watson recommended 'unsafe and incorrect' treatments for cancer patients, investigation reveals

Read Advisory Board's take: How can your organization use AI successfully?

IBM's machine-learning system Watson for Oncology recommended "unsafe and incorrect" treatments for cancer patients, according to internal IBM documents obtained by STAT News.

About Watson for Oncology

Watson for Oncology is software that uses artificial intelligence (AI) algorithms to recommend cancer treatments for individual patients. IBM said the AI technology is trained to provide treatment recommendations for 13 cancers. According to IBM, Watson for Oncology's treatment recommendations are based on data from real patients. IBM said 230 hospitals around the world use Watson for Oncology.

However, STAT News in September 2017 published an investigation that highlighted potential problems with Watson for Oncology. STAT News reported that physicians around the world had complained that the technology frequently recommended cancer treatments that were not suitable for their patients. According to STAT News, the AI technology "was not living up the company's expectations."

Internal documents show Watson for Oncology often recommended 'inaccurate' treatments

STAT News obtained slides from two internal presentations that Andrew Norden, an oncologist and IBM Watson Health's former deputy chief health officer, presented in June 2017 and July 2017. According to STAT News, the slides included criticisms of Watson for Oncology that IBM's Watson Health division had received from customers.

Specifically, the slides indicated that Watson for Oncology generated inaccurate treatment recommendations that were inconsistent with national treatment guidelines, including guidelines from the National Comprehensive Cancer Network, and stated that physicians at hospitals helping to promote Watson for Health had privately told IBM executives that Watson for Oncology was not useful for treating patents. For example, the slides stated that one physician at Jupiter Hospital in Florida had essentially told IBM executives that Watson for Oncology was worthless. The physician said Jupiter Hospital "bought [Watson for Oncology] for marketing … with hopes that [it] would achieve the vision," but the hospital "can't use it for most cases," STAT News reports.

In addition, the slides stated that IBM had designed studies to generate favorable findings on Watson for Oncology's effectiveness and identified several purported flaws with how researchers trained the AI technology. In particular, the slides stated that the training methods used for Watson for Oncology included a small number of cancers cases, which had been "determined without statistical input."

STAT News reports that a document it obtained dated July 27 stated that the "inadequacy of the training cases" for Watson for Oncology "undermined" the technology's effectiveness. The document stated that one or two physicians had trained the system to recommend cancer treatments using cancer cases that were compiled by engineers and not derived from real patients. "That meant that Watson's recommendations were driven by the doctors' own treatment preferences—not a machine learning analysis of actual patient cases," STAT News reports.

According to STAT News, the issues resulted in Watson for Oncology providing treatment recommendations that reflected "unconventional interpretation of evidence." In one example cited in the Norden's presentations, Watson for Oncology recommended that a 65-year-old man with lung cancer and evidence of severe bleeding receive a treatment that has a "black box" warning that the drug should not be administered to patients who are experiencing serious bleeding.

Overall, STAT News reports that the documents showed Watson for Oncology's "often inaccurate" treatment recommendations raised "serious questions about the process for building content and the underlying technology."

Reaction

Norden declined to comment on the matter, STAT News reports.

IBM in a statement to STAT News defended its software, saying the company has "learned and improved Watson Health based on continuous feedback from clients, new scientific evidence, and new cancers and treatment alternatives." IBM said, "This includes 11 software releases for even better functionality during the past year, including national guidelines for cancers ranging from colon to liver cancer."

Memorial Sloan Kettering Cancer Center (MSK), which had helped to train Watson for Health's AI technology, in a statement said it believes instances of unsafe and inaccurate treatment recommendations that were cited in IBM's internal documents were part of IBM's system testing of Watson for Oncology and do not represent recommendations given to actual patients. MSK said, "This is an important distinction and underscores the importance of testing and the fact that the tool is intended to supplement—not replace—the clinical judgement of the treating physician."

MSK noted that it had originally used data from real patients to train Watson but that IBM later determined that synthetic cases representing cohorts of MSK patients would better train the technology. MSK said, "[T]he speed at which standards of care have changed require a more dynamic approach than historical data can provide because historical cases do not necessarily reflect the newest standards of care. Synthetic cases also allow for diverse treatment options to be included in Watson for Oncology's recommendations, rather than a more narrow focus of how individual patients were treated at MSK."

According to STAT News, IBM continues to claim it used data from real patients to train Watson for Oncology (Ross/Swetlitz, STAT News, 7/25 [subscription required]).

Advisory Board's take

Allyson Vicars

Greg Kuhnen, Senior Research Director and Andrew Rebhan, Consultant, Health Care IT Advisor

While we can't speak directly to the details of IBM Watson's capabilities, this story touches on a challenge artificial intelligence (AI) has faced for decades: the over-marketing and premature deployment of solutions before they're ready for mainstream adoption.

AI and machine learning have made tremendous progress in the past few years. Many leading health systems are now realizing, through the judicious application of these technologies, quantifiable front-line improvements to the quality and efficiency of care delivery. In our research we've profiled organizations that are using AI to:

  • Deliver better diagnostic quality and clinical outcomes;
  • Improve targeting of high-risk population support;
  • Enhance operational efficiency; and
  • Allow for higher quality matching of job candidates to open positions.

“Concerns about the quality of answers, limitations of models, and supervision of these tools are often swept under the rug”

Successful deployment of these powerful technologies demands maturity from both the vendors involved and the health system's governance processes. No decision support or prediction system is perfect— the key to responsible use is a solid understanding of the capabilities, weaknesses, and appropriate use of the AI models. An AI system trained at one facility may fail to perform as expected when presented with patients from another facility where documentation practices vary or the population or care environment is substantially different.

In the scramble of AI vendors entering the market, concerns about the quality of answers, limitations of models, or expectations for ongoing management and supervision of these tools are often swept under the rug.

Organizations considering a foray into artificial intelligence technologies should ask several questions:

  • What is the specific outcome we are trying to improve? Do we measure and closely monitor that outcome today?
  • Does the vendor solution have a track record with a similar population in a similar environment?
  • Are they able to demonstrate their efficacy on the organization's actual data under realistic conditions before you make a full commitment?
  • Who will manage the training, re-tuning, and ongoing monitoring of the models in production?
  • How will AI-provided insights be incorporated into existing workflows and applications? How do workflows need to change?
  • What are the risks and fallback processes if the technology produces misleading or incorrect answers?

Inevitably, the adoption of AI in healthcare will result in failures, even patient deaths, but the benefits will be enormous as we learn to take advantage of its novel, powerful capabilities. Some organizations are explicitly choosing to begin their AI journey with applications in lower risk administrative or operational areas in order to build experience and process maturity before tackling higher risk clinical processes.

To learn more about how AI is changing health care—and how your organization can best build a strong AI program, make sure you register today for our upcoming webconferences on August 15th and September 6th.

Register for the Aug. 15 Webconference Register for the Sep. 6 Webconference


Next in the Daily Briefing

In moments of crisis, many hospitals go on 'lockdown.' Is it time to reconsider?

Read now