Data from a single hospital added more than 22,000 non-fatal gun injuries to CDC's 2015 national estimate, according to an analysis by FiveThirtyEight and The Trace, shining light on sensitivities in CDC's statistical model that can result in unreliable data.
Infographic: How to be data-driven
How CDC's 'increasingly unreliable' database works
CDC's gun injury estimate is considered the most authoritative estimate of non-fatal gunshot injuries in the United States, but its data have "grown increasingly unreliable" over the years, Sean Campbell and Daniel Nass write for FiveThirtyEight.
Last year, an analysis of multiple national gun injury databases conducted by FiveThirtyEight and The Trace revealed that CDC's dataset was the only one that showed a consistent increase in gunshot injuries each year, "an indicator that its estimates are out of step with other reliable data sources," Campbell and Nass write.
To calculate the estimate, CDC relies on hospital data from the Consumer Product Safety Commission's (CPSC) general injury database. CDC takes the number of gun injuries treated in each hospital in the sample and puts it through its statistical model to create a national estimate.
But a closer inspection of CDC's model for calculating its national estimate revealed a key flaw: The agency relies on a small sample size—about 60 hospitals—that leaves it vulnerable to significant changes when hospitals that treat high numbers of non-fatal gunshot injuries replace those with smaller caseloads.
"The smaller the number of hospitals in the pool, the larger the effect each one has on the estimate," Campbell and Nass write. That means, "[w]hen one hospital is replaced by another in the database … the changeover can cause the injury estimate to swing drastically," according to Campbell and Nass.
And, according to Guohua Li, editor-in-chief of Injury Epidemiology and founding director of Columbia University's Center for Injury Science and Prevention, there's another flaw: CDC's model does not take a hospital's volume of gunshot-related injuries into account, making the estimate susceptible to significant jumps.
The hospital that transformed the national estimate
The latest analysis of CDC and CPSC data by FiveThirtyEight and The Trace revealed how significantly one hospital's gunshot-related injury volume can affect CDC's non-fatal gun injury estimate.
The data revealed that in 2010, a hospital labeled Primary Sampling Unit 41 dropped out of CPSC's database. According to the data, that hospital treated fewer than 10 gunshot-related injuries per year between 2005 and 2010, with 20 total cases during that timeframe.
In 2012, that hospital was replaced by one that treated 793 gunshot-related injuries during its first full year in the dataset.
FiveThirtyEight and The Trace estimated that the replacement hospital added over 22,000 non-fatal gun injuries to the 2015 national estimate. As a result, that one replacement hospital alone accounted for over 25% of CDC's total non-fatal gunshot-related injuries estimate for 2015, which is the most recent year available.
How CDC plans to fix its 'unreliable' dataset
After years of CDC defending its model for calculating national gunshot injuries, CDC Director Robert Redfield in May acknowledged that the estimates were unreliable and said the agency would work to "improve the precision and accuracy of [its] non-fatal firearm injury estimates."
The fix will likely involve "expanding the roster of participating hospitals," according to Redfield. "The influence of any one hospital should be reduced and more stable estimates should be attainable," Redfield added. CDC and CPSC currently are evaluating how much it would cost to add more hospitals to the dataset, according to Campbell and Nass.
The agency in the meantime is working to reduce the spread of its unreliable estimates. The agency hid its 2016 and 2017 gun injury estimates and labeled them "unstable" on its public data portal, according to Campbell and Nass (Campbell/Nass, FiveThirtyEight, 8/13).