Hospital quality data in the National Surgical Quality Improvement Program's (NSQIP) clinical registry are unreliable measures of performance, according to a new study in JAMA Surgery.
Created in 2004 by the American College of Surgeons (ACS), the NSQIP provides risk-adjusted outcomes data to participating hospitals based on reviews of post-operative information in patient charts. Specifically, it focuses on assessing facilities' 30-day mortality rates and the frequency of complications like surgical-site infections and urinary-tract infections.
ACS claims that hospitals who adopt the program can save up to three dozen lives, prevent between 250 and 500 complications, and save millions of dollars on surgical costs annually.
For the study, a team of researchers led by surgeons at the University of Michigan Health System examined 2009 NSQIP complication and mortality data for 55,466 patients treated at 199 hospitals. The study looked exclusively at outcomes for six common surgical procedures:
- Abdominal aortic aneurysm repair;
- Colon resection;
- Laparoscopic gastric bypass;
- Lower extremity bypass.
- Pancreatic resection; and
- Ventral hernia repair.
The study found that few hospitals met "reliability" criteria, defined as quantifying "the proportion of provider performance variation explained by true quality differences."
Low reliability "can mask both poor and outstanding performance relative to benchmarks," the authors wrote, adding that the data may lead some underperforming providers to assume they are doing well, while average or high-performing hospitals are "spuriously labeled as poor performers."
Do surgical checklists work?
To achieve benchmarking reliability, the researchers recommended against the use of sampling in clinical registries. Rather, data should be collected on 100% of patients so that hospitals can accrue the highest possible caseloads, they said.
Separately, the authors praised NSQIP for being "among the leaders in implementing best practices to increase the reliability of outcomes measures."
A 'cautionary tale'
In an accompanying commentary, surgeons from Stanford University of Medicine praised the study's "elegant" assessment. But, they noted that only 5% of hospitals were participating in the NSQIP program in 2009—a sample that may be too small to accurately assess quality.
"The findings are cautionary to ranking systems that use observed to expected ratios as a surrogate for surgical quality," the Stanford surgeons wrote, adding, "Until the hospital cohort reflects the well-documented variation that occurs across the country, quality as determined by ACS-NSQIP should be interpreted with healthy skepticism."
Clifford Ko, director of the ACS's NSQIP, argued that the study's authors identified weaknesses that may occur in some programs, but did not identify a specific program. "While they use ACS NSQIP data to build these demonstration models, their conclusions do not apply to ACS NSQIP, though this is not made explicit to the reader until the last page of the paper," he said.
More broadly, he remarked, "It must be recognized that all of us in this arena are continuously working to improve the way we use data to improve surgical quality." While small models may not provide data with very high reliability, Ko argued that "limited information is often times better than no information" and still may advance quality-improvement efforts (Robeznieks, Modern Healthcare, 3/12 [subscription required]; McKinney, Modern Healthcare, 3/24/2012 [subscription required]).
Next in the Daily Briefing
HBR: When a leader should say 'I' instead of 'We'