Engineering Scalable Data Quality Assessments for Frontline Health with Medic Mobile


  • Comprehensively explore community health data from Siaya County, Kenya to identify, characterize, and categorize data quality issues it contains
  • Create a framework or toolset for systematically spotting these problems at scale
  • Prototype a set of reports, visualizations, or dashboards that community health supervisors and application administrators can use to resolve the issues identified by our tools


  • Discovered a wide range of inconsistent or problematic data points in Siaya County’s data, including many that we can create concrete detection and prevention tools to address
  • Created end-to-end testing pipelines combining two leading open-source data quality frameworks running 160+ individual tests to identify many of these problems in a sustainable way
  • Wrote a “Community Health IOP Cookbook” with examples for how to build on this foundation to be used both by Medic Mobile as well as other mobile health software providers
  • Prepared example reports and visuals for sharing inconsistent or problematic data results with audiences who can help fix them


Community health workers form a critical component of health systems in most countries, and they increasingly are using open source technologies, such as the Community Health Toolkit (CHT) to support their day-to-day activities. Every month, CHT applications log over a million patient interactions, and the resulting dataset has potential to drive more effective national health policy design, differentiated health service delivery, and improved program management and evaluation. However, community health worker-collected data is often considered low quality and unreliable for data-driven decision making, regardless of whether it is recorded on paper forms or generated by digital tools. Without trust in community-collected health data, how can leaders confidently invest in frontline health systems, make informed decisions about public health policy, and grasp opportunities to optimize health service delivery and patient outcomes?

Mistrust in the quality of this data restricts its potential impact and undercuts the effectiveness of the data to provide appropriate and timely care. Leaders in the sector frequently acknowledge that these problems exist, yet descriptions of data quality issues tend to be anecdotal and broad. On the other hand, little has been done to quantify specific quality issues in community health data and to provide mechanisms for identifying, categorizing, and resolving them. 

This was the challenge that Medic Mobile, DataKind’s flagship partner in its Impact Practice on Frontline Health Systems, was seeking to address. As technical steward for the CHT, Medic Mobile had established strategies for ensuring data integrity when information is first captured by community health workers, like required fields and logic validation on data entry forms. Yet these safeguards only addressed a handful of simple data quality problems, and many types of inconsistent or problematic data still found their way into mission-critical datasets. Could data science and machine learning help automatically find errors in the data? In a pilot exploration with Siaya County, Kenya, Medic Mobile and DataKind sought to build integrity in the data collected by 2,126 Ministry of Health-sponsored community health volunteers that serve 204,855 unique households. A successful outcome could strengthen Siaya’s health systems, open up pathways for aggregation of the rich community health data for deeper analysis, and contribute to the nascent ecosystem of improving data quality practices to make decisions backed by trusted data. Furthermore, this solution, built for the CHT, could also expand to Medic Mobile’s deployments in over 20 countries.


Data quality problems have a way of perpetuating themselves by undermining the trust that decision-makers have in the data. This lack of trust leads people to start viewing the integrity problems as intractable, systematic solutions don’t get created, and new problematic data points continue to pile on top.

What Happened

Led by Data Ambassador Nick Hamlin, the DataKind team of data scientists included Project Manager, Kellie Chan and Data Experts: Sebastien Ouellet, Melinda Gomez Tellez, Nat Steinsultz, Yunli Tang, and Brandon Sollins. The team learned that many of the quality problems appearing in the Siaya County data can be automatically detected and systematically summarized. In the same way that automated test suites can catch bugs in complex software products each time a change is made, similar tests can be created to proactively flag inconsistent or problematic data across the datasets produced by the CHT. This approach can track a wider range of data quality challenges than the usual form validations already in place, including everything from simple inconsistencies (like a child’s birth date being before their mother’s) to more involved statistical issues (like spotting miscalibrated thermometers that give unrealistically high or low readings). 


A reliable testing framework is key to disrupting this vicious cycle by making it possible to start to reliably identify problematic data. People start to trust the validated portions of the data set, and momentum and buy-in for additional testing and quality improvements grows!

After a broad exploration phase focused on identifying as many potential problems in the Siaya data as possible, the team turned its attention to creating an automated data testing pipeline for the CHT to ensure these issues could be automatically spotted and rigorously tracked in production. The data flows they delivered to Medic Mobile currently scan for more than 160 potential anomalies and data integrity issues within Siaya County’s maternal and child health-focused case management deployment (representative of 80% of deployments of the CHT). This toolset also provides opportunities for CHT administrators to explore non-compliant data points and track the frequency of data quality issues over time. These features start to fill the frontline health sector’s demand for precise and reliable data to drive more effective design and delivery of the right healthcare services, to the right people at the right time. 


This project provided a core stack of data pipelines supplemented by extensive documentation of testing approaches and analysis learnings. From there, implementing organizations like Medic Mobile will be able to decide the best way to connect their own data sources, schedule workflows, and integrate the outputs into their product. Note to reader: IoP refers to “inconsistent or problematic” data

What’s Next

Prior to this intervention, Medic Mobile generally, and Siaya County specifically, could not monitor data streams without labor and resource-intensive manual human intervention. Furthermore, the project has resulted in a framework for facilitating conversations about data quality across stakeholders and functions within a CHT deployment. 

With this project completed, DataKind and Medic Mobile anticipate moving to the next phase of deployment within the CHT: a closed beta with three Medic Mobile-solicited partner organizations that support a range of additional health services, such as family planning, non-communicable diseases, and mental health. Adapting this framework for a wider range of use cases will help to address one of the CHT’s core principles: designing for integrated care. Presuming a successful beta deployment, Medic Mobile intends to include this module and workflow with the CHT Core Framework’s next major release for all partner organizations (approximately 100 partners across 20 countries representing nearly 30,000 community health workers).


Source: © 2020 Medic Mobile photo archives

Support for this project was provided by the Johnson & Johnson Foundation

Read More

Scroll to Top