It's a truism that "we manage what we measure"—but for hospital leaders who want to "manage" their hospital's CMS star rating, it's been hard to know exactly what's being "measured" in the first place. The rating process, first established in 2016, has been largely opaque to providers. And for a system that's naturally going to attract skepticism, as any system picking "winners" and "losers" among hospitals would do, that opacity been a source of major controversy.
Get the cheat sheet on how CMS calculates their star ratings
Now, a new methodological analysis is adding further fuel to the fire. An updated analysis from Rush University Medical Center dug deep into the most recent February 2019 ratings and has found what they consider to be major flaws in the calculation methodology. (Rush is far from alone in raising such concerns: Similar pushback has come in the past from U.S. senators, the American Hospital Association, Modern Healthcare, and others.)
Rush says the underlying model prioritizes certain metrics which "inadvertently penalizes large hospitals, academic medical centers" and those providing "heroic care." And while it's perhaps not surprising that an AMC is criticizing a rating system they say disadvantages them, the detail and nuance of Rush's analysis is particularly noteworthy. I believe it's going to further inflame the controversy regarding the accuracy of overall stars system.
Read on to find out more about what Rush found and my key takeaways for hospitals. Then, join us for a webconference on March 19th at 1 pm EST when we'll dig into the nuts and bolts of the star rating program with experts from Rush, telling you what you need to know before CMS' public comment window closes.
Why Rush looked at the program—and what they found
Ever since CMS created the Hospital Compare website in 2002, hospitals have debated whether the website was accurately and effectively presenting quality metrics to consumers. But the debate took on new ferocity in 2016 when CMS introduced its five-star scale—which was intended to give patients an easy-to-understand summary of numerous quality measures, but which also created a single high-stakes measurement for hospitals to obsess over.
Rush first raised their concerns about the ratings methodology in May 2018, after learning their rating would fall from five stars to three stars in an upcoming update. To understand why their rating had fallen so sharply, Rush undertook an internal analysis comparing their data from the July release with their data for the previous December 2017 release. Their main finding: The statistical model CMS used was, essentially, dynamically changing the weighting of certain measures in every release—meaning that one specific performance measure could play an outsized role in determining a hospital's final rating.
With the most recent update, Rush has looked into the methodology once again and released a detailed analysis outlining what they claim are four major problems with the ratings.
What hospitals should know about Rush's findings
After reading Rush's analysis, what jumped out at me most was the importance of the statistical model underlying the rankings, called the "latent variable model." The team did an excellent job demystifying the model and explaining what contributes to these measures.
What's interesting—and frankly confusing—about the model is that it doesn't rank all measures equally. Instead, it assigns different values to measures according to how much hospitals vary in performance or how much measures correlate with each other. This has two major implications:
- Certain measures have an outsized impact on how a hospital is rated; and
- Hospital leaders don't know until after the fact which measures will have this outsized impact.
Rush was able to pinpoint specifically which measures had outsized impact on the star ratings in the February 2019 release. They found:
- The "Hospital-Wide All-Cause Unplanned Readmission" measure accounted for 22% of the star rating. (As background: The readmissions domain accounts for 22% of the total score. However, out of nine potential readmissions measures, the Latent Variable Model selected only one—the Hospital-Wide All-Cause Unplanned Readmission Measure—to actually determine the rating); and
- Similarly, the PSI-90 measures appear to account for 22% of the star rating. (The background here: The safety of care domain accounts for 22% of the total score. Based on Rush's analysis, out of eight potential measures, the latent variable model focused on just the PSI-90 measure.)
Rush argues that CMS' decision to focus so disproportionately on individual measures has several important impacts.
First, they note that tertiary care centers are more likely to care for high-acuity outliers (which negatively impacts the scores of such centers). Second, they say the methodology doesn't account for socioeconomic status. It's not surprising that Rush raises these concerns; they're familiar criticisms by AMCs.
But, for the data nerds among us, Rush also brings up an interesting third point. They say that readmission scores are risk-adjusted in a way that can negatively impact the rating for large hospitals. In essence, they argue by adjusting low-volume hospitals to the mean, you necessarily displace high-volume hospitals away from the mean… and to the high and low extremes. This means that hospital size can largely bias star results—and they share the following chart as one piece of supporting evidence.
My 3 key takeaways
I think this analysis leads us to three key takeaways for providers:
- Don't chase metrics. This development further reinforces my guidance not to chase performance on individual metrics for star ratings (or any of the CMS safety and quality programs). Instead, we recommend having an overarching safety and quality strategy—and trusting that strong performance will be reflected in metrics.
This is doubly true for star ratings, as the methodology behind star ratings is likely to change. CMS is considering moving away from the latent variable model, and is inviting public comment on the change. (If you want to weigh in, you have until March 29th to submit a comment).
- Historical performance will likely matter more in the future. It's looking likely that CMS will move to a system that more equally weights measures within each domain. This means star ratings will be based on seven domains and up to 60 measures (see here for a quick recap on how star ratings are currently calculated). But many measures are collected on different time frames—some on a rolling three year basis starting in 2014, others for a single year starting in 2017. The range in time frame means a single bad year can cast a long shadow, and current performance may not be reflected for several years.
- Comment on the program soon (if you want to). You have a window in which to shape the future of the star rating program! CMS is accepting comments on the methodology until March 29, 2019. Rush's leaders have their own recommendations—be sure to share yours with CMS.
Please join us for a webinar on March 19th at 1pm EST to dig into details of the nuts-and-bolts of the star rating program. I'll be joined by experts from Rush, including Bala Hota, MD, MPH, Chief Analytics Officer and Associate CMO of Rush University Medical Center.
We'll dig into how CMS calculated the February 2019 star ratings, the Rush team's analysis of the strengths and drawbacks of the analysis, and what they would recommend for improving the program in the future.
Register for the Webconference