Blog Post

How—and why—health organizations are using synthetic health care data

November 19, 2019

    Having access to comprehensive patient data is necessary for the advancement of health care in general, and digital health in particular. However, health care organizations have struggled to acquire and manage the volume and variety of data that is available today. "Synthetic" health data is helping to fill the gaps to promote health research and app development.

    Checklist: Get started with digital innovation at your organization

    What is synthetic health data?

    Synthetic health data, sometimes referred to as synthetic health records, are data sets that contain the health records of realistic—but not real—patients. Developers can control how comprehensive they make the records, which may include complete medical histories, allergies, social factors, genetic information, images, and more. 

    Synthea, the original synthetic health data software created by MITRE, uses generic models to independently simulate the diseases, medical care, and health outcomes of each artificial patient from birth until death. The open source records Synthea produces are HL7 and FHIR compatible, and free for anyone to access online.     

    Why is synthetic health data useful?

    Barriers to accessing real patient data can be immense, but that health data is vital to conduct research, plan public health measures, and develop effective health IT applications. Still, patient data can be costly to access, and researchers often need a lot of it to build accurate models. There are also significant privacy concerns surrounding patient data; waiting for approval from an independent review board slows down research and development, and there have been cases of anonymized patient records being re-identified.

    Enter synthetic data. It is typically free, immediately accessible in large quantities, and no one worries about the privacy of fake patients. Moreover, there are some things that can be done with synthetic data that just aren't possible with actual health records. For example, the data can be modified to fit certain demographic populations or include specific pieces of health information.  

    That said, synthetic data is not without its limitations. Jason Walonoski, co-creator of Synthea, has said that synthetic data should not be used for clinical discovery, and that researchers should always go back to real data to verify their results. A recent study also concluded that while Synthea does a good job of modeling "average" health care encounters, it has trouble accounting for variations in care.

    Who is using synthetic health data?

    Many different types of organizations are employing synthetic data to improve health care:

    • The Office of the National Coordinator for Health Information Technology (ONC) is currently working on a project that brings together experts to apply synthetic data to the treatment of opioid addiction, pediatrics, and complex care patients.

    • The Department of Veteran Affairs used Synthea to create veteran-specific synthetic patient records; that data is available for developers to create new health applications for VA patients through its Lighthouse API.

    • Boston Children's Hospital created a similar platform, called SMART, to create and implement apps to improve clinical care, hospital administration, and research.

    • Google has even partnered with MITRE to make its SyntheticMass data set available through Google Cloud Healthcare API and Apigee Edge.

    What does synthetic data mean for hospitals?

    Synthetic health data will help to overcome barriers to health care innovation and bring new applications and research models to life. Patients, researchers, and providers will be the downstream beneficiaries of these breakthroughs. Training machine learning and artificial intelligence algorithms require vast pools of data, some of which can be provided through synthetic means. This data can also aid software development, enhance interoperability, and model health policies and interventions.

    Consider how your health system can benefit from synthetic health records. Possibilities include:

    • Arming your clinicians with a safe and robust new source of data for early to middle stages of research;

    • Giving developers a new resource for creating systems to improve hospital functions; and

    • Deploying it as a secure tool for teaching medical students and health IT professionals using realistic patient records.

    As MITRE's Jay Walonoski told us, "It's impossible to predict all of the use cases for it. That's the beauty of having it out in the public. People are going to use in in all sorts of ways to solve all sorts of problems."

    What problems will you solve?


    What 'Google Health' will look like in 5 years

    We'll outline the three major roles Google wants to play, the strategies they are adopting, and their chances of success.

    Register Now

    Follow the path to AI at your organization

    A new wave of AI-powered capabilities is likely to improve—and even—transform health care operations, but success with these new technologies requires a strong foundational analytics program.

    Health care organizations must ensure they have the necessary data sources, architecture, governance, talent, leadership, and data-driven culture for their programs to deliver consistent value. With all of the necessary assets in place, health care organizations will then be ready to put analytics and AI into practice.

    Download Now

    Have a Question?


    Ask our experts a question on any topic in health care by visiting our member portal, AskAdvisory.