by Eunice Jeong and Ty Aderhold
The proliferation of mobile health (mHealth) apps can be both a blessing and a burden to health industry leaders.
These apps can complement and improve a health system's digital health strategy and patients' experience. For instance, mHealth can be integrated into a variety of service lines and care contexts, with apps targeted for wellness, symptom relief, treatment management, and other purposes. According to Becker's Health IT, mHealth also allows patients to better engage in their care and reduces over-utilization of care facilities, ultimately producing cost savings for health systems.
Health systems looking to add mHealth apps to their digital health portfolios must carefully select apps that align with their needs. But that's often easier said than done: the saturation and constant expansion of the mHealth market makes it difficult to determine which apps best fit a health system's needs. There is no shortage of options to choose from with over 300,000 apps available.
In addition to the sheer volume of apps, there are concerns about a lack of proper empirical support to determine app quality and efficacy. A 2019 study published in Nature Digital Medicine confirmed that a majority of behavioral health apps do not provide credible evidence to support their products' claims.
How mHealth apps are evaluated today—and where the process falls short
To address efficacy concerns, researchers have developed a few dozen mHealth evaluation platforms, including the Mobile App Rating Scale (MARS), Xcertia, RankedHealth, ORCHA, Psyberguide, and Mindtools.io. These platforms use research-based, expert-reviewed methodologies to assess mHealth app performance on a variety of quality metrics.
Most evaluation platforms take a multi-step approach to the review process, but each has its own methodology. Apps that collect large amounts of user data or provide integrated care are generally reviewed more thoroughly compared to more basic apps that focus on general well-being.
Platforms assess the app's compliance with several relevant quality metrics. Assessment categories are relatively standardized across different evaluation platforms; general metrics include usability, privacy, security, appropriateness and suitability, transparency and content, safety, technical support and updates, and compatibility with mobile device. The platforms use this data to generate a final comprehensive evaluation or final score determining the overall quality of an app.
Although assessment categories are roughly similar across most evaluation platforms, there are often inconsistencies between how different platforms weigh those metrics, which can result in significantly different judgments. A 2019 study assessed how three evaluation platforms (ORCHA, Psyberguide, and Mindtools.io) judged 25 of the top behavioral health apps along the metrics of "user experience," "credibility," and "data privacy." They found two main points of concern:
- Irregularity between the platforms' databases of apps. Each of the evaluation platforms rated its own limited set of apps that were all slightly different from one another. Some of the most frequently downloaded apps were not covered at all by the platforms, and none of the platforms covered all 25 of the top-downloaded behavioral health apps.
- Discrepancy between each platform's ratings across the three dimensions of review. Researchers discovered that across the three platforms, there were only slight or low levels of agreement among the platforms for the three metrics (user experience, credibility, and data privacy). For example, a particular app might have received top-tier ratings for data privacy by one platform but middle-tier ratings by another platform. The researchers noted that even the most popular apps do not receive uniformly favorable scores from the three platforms.
For health systems to successfully invest in mHealth, the clinical efficacy and benefits must be unambiguously proven. This requires better-standardized app evaluation methodology. Ultimately, there should be standardized criteria for judging mhealth quality across all evaluation platforms. For example, FDA developed a standardized set of criteria to evaluate mobile medical applications in 2015. Although this guideline is not applicable to most mHealth apps, it serves as a good example of what standardized guidelines could ideally look like.
What health systems can do now to vet and select apps
Without better methods for app assessment, it will be very difficult and time-consuming to vet the best mHealth options. For health systems, this can lead to additional costs, administrative burden, and patient dissatisfaction. Until better set evaluation standards are developed, health systems will have to create and use other options for app assessment.
One option would be a meta-evaluation tool that aggregates the results of multiple individual evaluation platforms, like this one created by researchers. This meta-evaluation was created through a multi-step process as follows:
- All questions from 45 individual evaluation platforms were combined
- All redundant questions were removed
- The remaining questions were grouped into five categories in order of priority: access/functionality, privacy, evidence, usability, interoperability
- The questions were edited to be as data-driven and objective as possible with numerical or binary ratings, instead of relying on subjective judgement
A meta-analysis that aggregates the findings of multiple sources can smooth out the errors of any single platform while also including as much useful information as possible in a single source. Individual evaluation platforms have significant differences, and could have potential biases, outliers, or skewing. A meta-evaluation can provide a solution. When there are thousands of apps for one particular purpose available on the market, a successful meta-evaluation tool can help health systems sort through options and pick the best app suited for their unique needs.