Cheat Sheet

Disparity in Data

Data in health care is not as objective as you think. Uncover the social implications of biased data and learn how to address disparities.

Key Takeaways

  • Health care data reflects underlying disparities. Populations that face socioeconomic barriers to accessing care are underrepresented or misrepresented in our systems.
  • The “digital divide” in health care contributes to disparities in data. Digital technologies provide a new source of patient-generated health data, but only for patients who can afford and access them.
  • Data enables analytics and artificial intelligence (AI), but we must understand the shortcomings of our data and the social implications of using biased data to drive care decisions. Marginalized populations that are left out of data will not reap the benefits of AI.

To identify the social implications of using analytics and AI, health care leaders must first understand the quality and completeness of their data. Only then can the data be useful for the entire patient population.

What is it?

Data is often considered an objective source of truth. But there are underlying issues in health care data that can lead to skewed inferences and decisions.

1. Incomplete data: Gaps in data prevent us from having a holistic view of a patient or population. Marginalized groups may experience more fractured care and less documentation of conditions and outcomes. Demographics and social determinants of health (SDOH) influence outcomes but aren’t routinely captured in systems. Only one-third of commercial plans reported having complete or partially complete data on race, a pattern that is likely reflected in electronic health records (EHRs).

2. Small sample size: Marginalized populations are not adequately represented in health care data. More data is available for those that are able to access care and treatments, and “data deserts” exist for groups that experience systemic barriers to accessing care. Underrepresentation in data can lead to less informed care decisions or flawed inferences about a population.

3. Historical inequities embedded in data: Health care outcomes are not the same across populations. For example, black women are 42% more likely to die from breast cancer. This can be at least partially attributed to factors like a higher burden of comorbidities and barriers to accessing care that stem from the enduring legacies of structural racism and intergenerational poverty. Black women are also more likely to be diagnosed at later stages of the disease and experience delays in treatment of two or more months. These types of inequitable outcomes are baked into health care data.

Why does it matter?

Data is viewed by many leaders as a critical asset that will lead to better, more efficient health care. Advanced analytics and artificial intelligence can turn complex EHR data into actionable insights that improve decision-making and care delivery. But if EHRs are to become the new textbooks for health care, we must consider the disparities that live in their data.

The IT adage “garbage in, garbage out” suggests that flawed inputs will produce poor outputs. The same is true for bias—bias in, bias out. Algorithms learn from historical patterns to make predictions and decisions, but if they learn from biased data they will produce biased outputs. By using these insights to inform care decisions, systems may unintentionally create or perpetuate inequities.

For example, one emerging application of AI is predicting intensive care unit (ICU) demand. Algorithms can be used to identify which inpatients are at risk for clinical deterioration and will require a transfer to an ICU. A model could be built using historical health records of patients who were transferred to ICUs. But if the training data contains more white than black patients, the model will make better predictions for white patients. Deterioration might be underestimated for black patients, leading to fewer transfers and worse outcomes.

Structural inequities in health care prevent marginalized groups from accessing timely and quality care. These inequities are embedded in health care data. If we aren’t careful, analytics and AI can automate and scale health disparities.

It’s not ideal to make broad-sweeping suggestions in communities where we have data deserts.- Robert WinnPhysician and director, Virginia Commonwealth University (VCU) Massey Cancer Center

How does it happen?

Data disparities, while unintentional, can be introduced by a number of factors.

  • Data gaps or lags occur when disadvantaged groups experience barriers in access to care. Data collection is skewed toward those who can afford and access care. For example, a diagnosis documented in the EHR signifies when a patient accessed care and a provider made a diagnosis, but not when a patient first got sick. Populations with fragmented or limited care are underrepresented in health care systems. When data is available for these groups, it often includes the more severe cases and can misrepresent the overall health of the population.
  • Socioeconomics status influences where patients access care, and data collection varies across care sites. Health care facilities don’t all collect the same information in the same way. Patients of low socioeconomic status are more likely to seek care in teaching clinics or community hospitals, where data input may be less consistent due to constrained time and resources.
  • Providers’ subjectivity can influence what data is captured during a visit. Unstructured data like chart notes can reflect the implicit biases of providers. When it comes to demographic data, staff may be hesitant to have potentially sensitive conversations and may make assumptions about gender or race.
  • Health record systems and processes are not built to collect holistic patient data. EHRs were built to be billing tools first and clinical tools second. They are often missing data and are not set up to easily capture demographic and SDOH data. Although there are SDOH codes and Z codes, providers have few incentives to track this information because it’s not reimbursable.
  • Remote patient monitoring (RPM) devices produce additional data, but exacerbate the “digital divide.” RPM devices provide detailed data about a patient’s health condition when they are not in a facility, but data is collected only for patients who can access and use devices. Populations who are less techsavvy or don’t have consistent internet access will be left out of these data sets.

Conversations to have

Understand how your data may reflect health disparities. 

•What populations are missing or underrepresented in ourdata? Are our sample sizes representative of our community?• What types of information are not captured in our data?Consider clinical, demographic, community, and SDOH data.• How do health outcomes differ across population subgroupssuch as race, ethnicity, gender, and socioeconomic class?

Discuss ways to account for gaps in data.

•How can we capture more holistic data on our patients? Doour providers screen for SDOH when interacting withpatients? Have we considered partnering with third parties?• How can we regularly engage community leaders to betterinform our understanding of the patients we serve?

Determine what factors might be causing disparities in dataand how you can address these issues.

•Do implicit biases of providers influence data collection anddata entry? Do we provide system-wide training on bias?• What barriers to accessing care exist in our communities?How can we ease access to care for marginalized groups?




1. You will understand the social implications of health care data.

2. You will recognize why health care data is not as objective as you might think.

3. You will know how digital technologies contribute to disparities in data.


Mackenzie Rech

Senior analyst

[an error occurred while processing this directive]

Don't miss out on the latest Advisory Board insights

Create your free account to access 1 resource, including the latest research and webinars.

Want access without creating an account?


You have 1 free members-only resource remaining this month.

1 free members-only resources remaining

1 free members-only resources remaining

You've reached your limit of free insights

Become a member to access all of Advisory Board's resources, events, and experts

Never miss out on the latest innovative health care content tailored to you.

Benefits include:

Unlimited access to research and resources
Member-only access to events and trainings
Expert-led consultation and facilitation
The latest content delivered to your inbox

You've reached your limit of free insights

Become a member to access all of Advisory Board's resources, events, and experts

Never miss out on the latest innovative health care content tailored to you.

Benefits include:

Unlimited access to research and resources
Member-only access to events and trainings
Expert-led consultation and facilitation
The latest content delivered to your inbox
Thank you! Your updates have been made successfully.
Oh no! There was a problem with your request.
Error in form submission. Please try again.