Prominent hospital rating systems are highly influential, but they also can generate contradictory results that do not always align with clinicians' own assessments, a group of authors write in NEJM Catalyst, highlighting the shortcomings of several major hospital rating systems.
The Catalyst article was authored by eight health care experts, including six individuals whom authors describe as "physician scientists with methodological expertise in health care quality measurement." These six served as the group's hospital rating system "evaluators."
The authors note that in recent years, hospital rating systems have grown in both number and influence, but it remains "unclear whether current rating systems are meeting stakeholders' need." Namely, the authors write different rating systems give conflicting ratings and there's often a "disconnect" between institutions that ratings recognize as leaders and those clinicians recognize as major referral centers.
'Rating the raters'
To provide a clearer picture of hospital rating systems, the authors launched their "rating the raters" initiative. They focused on four major hospital rating systems:
- CMS' Hospital Compare Overall Star Ratings;
- Healthgrades Top Hospitals;
- Leapfrog Safety Grade and Top Hospitals; and
- U.S. News & World Report's Best Hospitals.
The authors established six major categories to assess each rating system:
- Iterative improvement;
- Potential for misclassification of hospital performance;
- Scientific acceptability;
- Transparency; and
For their project, the authors relied on both "objective and subjective criteria" to develop a "point-by-point analysis of the strengths and weaknesses of each rating system." The authors gave each rating system the opportunity to review the fact sheets and provide input as well as correct any errors.
Evaluators were also asked to assign each rating system a letter grade, ranging from A to F, based on the analysis, and those grades were averaged. Evaluators had in-person interviews with "leaders and/or methodologists from each of the rating systems" to clarify any issues and learn more about their systems.
The grades were as follows:
- U.S. News a B;
- CMS Star Ratings a C;
- Leapfrog a C-; and
- Healthgrades a D+.
The shortcomings of the rating systems
In the NEJM Catalyst article, the authors also provided a summary of common problems among the rating systems. For instance, the authors noted five problems that were present in each of the rating systems examined.
1) Reliance on limited data. According to the authors, most of the rating systems use administrative data collected for billing instead of clinical purposes. Typically, the data are limited to those 65 and older who are part of the Medicare Fee-for-Service program. These data "lack adequate granularity to produce valid risk adjustment," the authors write.
2) Lack of robust data audits. "[W]hen the rating systems generate their own data through surveys, these data are not always made available publicly for analysis to allow for independent assessment of validity and reliability," the authors write. They also note that many of the rating systems rely on self-reported hospital data that is not subject to audits.
3) Varying methods for compiling and weighting composite measures. The authors note that each rating system used a different method for developing composite measures, resulting in hospitals' overall scores or grades varying greatly. In addition, the authors note "there is often limited rationale for the selection and weighting of different elements in the composite," and in some cases the chosen weights differ from how a stakeholder would view them.
4) Difficulty managing outcomes measurement at small hospitals. The authors note that small hospitals typically have less reliable performance estimates due to their lower volumes. To account for that, the authors write most system methodologies "smooth or shrink rates essentially pushing lower-volume hospitals toward the mean." That makes it extremely hard for small hospitals to be recognized as top or bottom performers, according to the authors.
5) No formal peer review. While each of the examined rating systems used expert panels to some degree, the authors note that these panels typically "provide input intermittently and without detailed methodological review."
The authors also discussed potential financial conflicts, such as hospitals paying rating systems to display their performance. The authors write that such practices could "create unfortunate incentives" and raise the "concern that the business of selling these ratings leads to a model that encourages multiple rating systems to intentionally identify different 'best hospitals.'"
How hospital rating systems can do better
The authors identified four ways the rating systems could improve:
- Better data: Instead of using administrative or self-reported data, systems should use all-payer data, which "would present a more accurate and complete representation of quality," the authors write.
- Better measures: The authors write that current data measures "fall far short in many domains and suffer from inadequate risk adjustment, questionable relationship to outcomes, and unacceptable lag times."
- Meaningful audits: All data in the rating systems should be "subject to a strong audit program," and data sources and methods should be transparent and publicly available, the authors write.
- Peer review: The authors "encourage all rating systems to submit studies of their analytical approach, decisions, and periodic modifications for real peer review and publication, preferably well ahead of employing them in public ratings."
Rating systems' reactions
Ben Harder, chief of health analysis at U.S. News, said that looking at rating systems is important. "The systems themselves deserve to have their tires kicked and people really scrutinizing them," he said.
However, some raters felt the authors' methods were flawed. Leah Binder, group president and CEO for Leapfrog, said, "Th[is] piece conflates two of Leapfrog's programs in a way that vastly misrepresents both, and makes demonstrably false statements about the intensive audit process Leapfrog conducts for over 2000 hospitals every year."
Similarly, Mallorie Hatch, director of data science for Healthgrades, said the authors "misrepresented" the methodology for Healthgrades' overall hospital award. Hatch also said that Healthgrades' "feedback was not incorporated" in the article (Bilimoria et. al., NEJM Catalyst, 8/14; Goldberg, Crain's Chicago Business, 8/14).