Strengthening Frontline Health Systems with Data Science & AI: Updates From Our First Cohort of Projects

By Mitali Ayyangar, Portfolio Manager, Frontline Health Systems, DataKind

Frontline health workers (FHWs), and a subset known as community health workers (CHWs), have made tremendous strides across the world to expand access to basic and life-saving health services to the last mile. For instance, through programs such as integrated community case management of childhood illnesses, FHWs have helped mortality rates of children under five drop from over 9 million a year in 2007 to 5.3 million in 2018 and evidence indicates that effectively trained, well-distributed, and managed FHWs also help reduce spread of infectious diseases – something we’ve all come to appreciate in these “COVID-times.”

However, the scale of the problems is still huge. Half the world’s population lacks access to healthcare, including more than a billion people in rural and remote communities. Women, newborns, and children are most vulnerable: the World Health Organization estimates that 15,000 children under five die every day, and one pregnant woman dies every 11 seconds from preventable causes. In a vicious cycle, low- and middle-income countries (LMICs) lose trillions of dollars each year in economic welfare as a result of these deaths, hindering critical investments in health systems and a health workforce needed to tackle these needs.

The vast majority of deaths at the last mile are potentially preventable, however, given effective and timely care. For instance, 70% of deaths among children under five are due to conditions that could be prevented or treated with access to simple interventions. 

There are three critical delays that systemically impede the provision of timely and appropriate care:1

  • Delay 1: Delay in the decision to seek care 
  • Delay 2: Delay in identifying, accessing, or reaching care 
  • Delay 3: Delay in the provision of appropriate care

While FHWs have helped reduce many of these delays, it’s evident that they’re outmatched by the scale of these problems. They lack visibility into patients’ overall health information that could help match them to more appropriate, differentiated care. They struggle with cumbersome handwritten and analog systems that prevent them from utilizing digital data to inform their decisions. Training is often insufficient and of unknown quality. FHWs also require large crowds of people to fulfill laborious processes, like data oversight, that are simply unscalable.

More recently, COVID-19 has also highlighted the vulnerability of health systems and the need for timely and reliable data to monitor the intensity of the pandemic, identify risk factors, forecast the spread, and make rapid and effective decisions to prepare healthcare capacity and essential supplies. Robust data and analysis are also critically needed to reveal systemic inequities that result in certain populations being disproportionately at risk for contracting the virus, dying from it, and/or being economically worse affected by it.

If humanity is to confront and address these issues, new thinking and cutting-edge technologies will be required to safeguard communities from being underserved by strengthening health systems, including health service delivery and maximizing the potential of the existing health workforce. Data science is a proven tool for making the invisible visible with data, predicting future outcomes so people can act more quickly and automating laborious processes. In light of the pandemic too, analysis of supply chain data, improvements to telehealth, and automatic translation have come up as powerful examples of how machine learning could assist health workers in this fight and the potential for it to be a critical tool in reducing delays and saving lives.

It’s for these reasons DataKind has focused on a concerted effort to identify the highest impact opportunities for data science, machine learning, and AI to improve the efficiency of Frontline Health Systems under our first Impact Practice with our expert partners: Jacaranda Health, Medic Mobile, and Riders for Health. 


As the Portfolio Manager of this Impact Practice, I’m excited to share updates from this first cohort of projects, which we launched in the fall of 2019 and are now entering the final phases thanks to the excellent contribution of an all star team of 22 pro bono data science experts (you can read more about them in our Volunteer Spotlight blog series).

Jacaranda Health: Assessing Frontline Health Worker Training & Healthcare Facility Outcomes to Improve the Quality of Prenatal & Postnatal Care (Kenya) 


(Credit: Nurse Mentorship Program, Jacaranda Health, Kenya 2020)

According to recent research, more than 90% of all births in high-income countries benefit from the presence of a trained midwife or nurse whereas in LMICs, fewer than half of all births are assisted by such skilled health personnel. The lack of health care providers in LMICs is further plagued by a lack of specifically trained class of workers who can guide a successful birth. Training programs can provide these skills to FHWs to provide antenatal care, standardize timely referral processes for in-facility births, and provide delivery support in a multitude of settings, including in cases of maternal and neonatal emergencies. However, training programs for FHWs vary considerably. Traditional modes of learning and outcome evaluation struggle to generate effective, proven training curriculum design with linkages to positive health outcomes. While randomized control trials are the most rigorous mode of testing, they’re also the most challenging to implement in last mile health care.

As a result of an initial consultation with 100+ frontline health organizations, we realized that data science and machine learning could provide an opportunity for continuous and automated evaluation of training programs data to firstly, determine which components are most effective and secondly, promote standardization of those components across learning platforms. To explore this opportunity to streamline FHW training programs, we partnered with Jacaranda Health

Jacaranda Health partners with public hospitals in Kenya to improve maternal and newborn health outcomes by increasing the quality of antenatal, delivery, and postnatal care provided to mothers and their babies. Its innovative Nurse Mentorship program provides FHWs with intensive hands-on training that incorporates real-time coaching and simulation drills. Each trained frontline nurse serves thousands of mothers a year. Jacaranda Health partnered with us to unearth machine learning and AI opportunities for continuous and automated evaluation of its training program. 

Our volunteer team is assisting Jacaranda Health to streamline its training data for analysis to identify which components are most associated with positive health outcomes. We’re also building a data pipeline and automated monitoring system to track performance of trainees and ensure 100% completion rates of key modules. 

DataKind’s solution will save Jacaranda Health staff significant time that used to be spent manually pulling and combining data sets to monitor program performance and trainee activities in up to 150 public health facilities across the country. By using the tool, Jacaranda Health is able to have a comprehensive view into its programs and make data-driven decisions. It allows program leaders to combine local knowledge with empirical training outcomes to continue to iterate on the training program and drive sustained improvements in skills that can save the lives of mothers and babies. Furthermore, due in part to this work, Jacaranda Health was able to obtain funding to bring two data science hires onto its team to learn from DataKind and take on maintenance efforts. Find out more about these #data4good jobs here

Medic Mobile: Developing Data Quality & Assurance Algorithms to Build Integrity in CHW-Generated Data for Better Patient Tracking & Care (Kenya)


(Credit: Medic Mobile)

Digital tools are making it easier to collect digital health data about patients closer to where they live to better understand their health needs and treat them faster, thereby saving lives. CHWs are increasingly collecting digital data and such routinely generated data provides critical information for tracking patients, providing appropriate and timely care, developing advanced algorithms to assess population health, and targeting specific health interventions based on predictive analytics. However, while digital health tools are being widely adopted, CHW-collected data is still considered low quality and not reliable for data-driven decision making. At a systemic level, mistrust in the quality of the data restricts its potential impact. Without trust in community-collected health data, how can health system managers, leaders, and policy makers confidently invest in frontline health systems, make informed decisions about public health policy, and grasp opportunities to optimize health service delivery and patient outcomes?

Following consultative conversations with leading digital case management platforms in the frontline health sector, DataKind realized that in the same way that automated test suites can be used to catch bugs in large scale software products, similar tests can be created for digital tools being used in the most remote and fragile settings. To explore opportunities afforded by data science and machine learning to build data integrity, DataKind partnered with Medic Mobile – its flagship partner on this project to create a toolkit for identifying inconsistent or problematic (IOP) data for remediation.

Medic Mobile is an organization with a mission to improve health in the hardest-to-reach communities by building a world-class, open-source, digital case management platform – the Community Health Toolkit (CHT). Each year, the CHT logs millions of patient interactions and Medic Mobile has implemented several strategies for ensuring data integrity when information is first captured, such as required fields and logic validation on forms. However, many types of IOP data still find their way into mission-critical datasets (e.g. patients with negative ages, pregnant women with negative pregnancy tests) undercutting the effectiveness of the data to provide appropriate and timely treatment and hampering broader data aggregation pathways for deeper analysis.

In a pilot exploration with Medic Mobile’s deployment in a county in Kenya, where 2,126 CHWs serve over 204,000 households, our teams looked to create software and workflows to integrate regular data confidence practices into regular use within this deployment and beyond via Medic Mobile’s CHT. Our volunteers have built an automated data testing pipeline that currently scans for 160 potential anomalies and data integrity issues within this maternal and child health-focused deployment (representative of 80% of Medic Mobile’s deployments). This toolset also provides opportunities for CHT administrators to explore non-compliant data points and track the frequency of data quality issues over time. These features start to fill the frontline health sector’s demand for precise information about data quality issues needed which will, in turn, drive more effective national health policy design, differentiated health service delivery, and improved program management and evaluation. 

Screen Shot 2020-07-21 at 1.37.47 PM.pngScreen Shot 2020-07-21 at 1.37.56 PM.png

Dashboards (left) and interventions (right) created as part of the Medic Mobile Data Integrity project

Medic Mobile: Improving Predictive Models for Better Maternal & Child Health Outcomes (Kenya)


(Credit: Medic Mobile)

Two of the primary delays that lead to increased maternal deaths are the delay in seeking treatment and the delay in being matched to appropriate care. DataKind and partners know that a single FHW or CHW may be tasked to serve over 100 households and that they must divide their time amongst patients. Every frontline health system relies on matching health care workers to those most in need, and yet, without knowing more accurately who needs assistance and what type, lives may be lost. 

Predictive modeling of which patient needs care most urgently could dramatically reduce delays to receiving appropriate care. In 2018, Medic Mobile worked with Living Goods to develop and deploy its first risk profiling algorithms in Kenya to enable CHWs to provide more proactive, differentiated care. Medic Mobile wanted to test the robustness of these models if they were deployed in more typical low resource and more data scarce conditions and help it achieve its goals for maternal and child health. For this first pilot of model transfer, Medic Mobile partnered with DataKind to create similar-performing models on datasets from a different, comparable county in Kenya, with the anticipation that this lightweight model would be more likely to transfer between deployments.

Our volunteers tested multiple algorithmic approaches to predict households that are most at risk of having (1) a home birth, (2) a newborn that will experience danger signs, or (3) delayed access to care for a sick child under the age of five so that CHWs are better able to deliver proactive, differentiated care for the people likely to face barriers to accessing care. So far, this work has resulted in models of varying accuracy highlighting the difficulty in predicting outcomes with sometimes sparse or inconsistent data. Nonetheless the effort has generated profound learnings and insights on what information is linked to these outcomes and what data holds the potential to improve the performance of these models dramatically. The team has concluded that while the models’ predictive performance doesn’t support immediate deployment or scaling at this time, the performance across all three seem to be highly dependent on the integrity (completeness and accuracy) of certain types of data. These learnings were passed to the data quality and assurance team at Medic Mobile and have been incorporated into the toolset mentioned above. 

Riders for Health: Building Character Recognition Systems to Quickly Digitize Written Records to Save Time & Lives (Nigeria)


(Credit: Riders for Health, photography by Tom Oldham)

An essential and typical task for most FHWs and CHWs is the collection of field data. Recent estimates suggest that globally, only about 20% of CHWs generate some form of digital data and that the majority of frontline health organizations rely on paper-based record-keeping (and will continue to do so for years to come given that paper is a trusted and low-cost medium). There are huge disadvantages to this and delays in aggregation and digitization of this data is a sector-wide pain point. Many organizations readily articulate the challenges of working with paper data and its limited utility in making timely and informed decisions or for making agile system improvement. 

There are, however, unique opportunities afforded by advances in machine learning and AI, particularly in image detection and optical character recognition, that allow for a data science intervention to quickly digitize written records. To begin with, there are many standardized forms in use in international, national, and subnational health systems – for instance, for health campaigns (e.g. tuberculosis, vaccinations) and sample transport (e.g. biohazardous or health sample collection), which could potentially be digitized rapidly after handwritten completion. To explore this opportunity to create greater efficiencies in an effort to enhance healthcare delivery and reach patients faster and more efficiently, DataKind partnered with Riders for Health (Riders). 

Riders is an organization that exists to create access to healthcare services for millions of people in the most underserved communities across Africa. Riders does this through its partnerships with Ministries of Health (MOH) in five African countries where it provides end-to-end transport services to reach nearly 47 million people with essential health services. A critical Riders operation is providing medical sample transport for infectious diseases, with a focus on HIV and tuberculosis (and, more recently, COVID-19), from rural health centers to labs where they’re tested and diagnosed. To maintain quality and integrity of samples between the stages of collection, transportation, storage, and analysis, Riders couriers maintain a logbook. Riders couriers generate more than 1,500 paper-based records every day as they copy information from the MOH-generated forms into this logbook. This laborious process severely limits the volume and frequency of visits couriers can make to health facilities. It also causes degradation of samples and delays in diagnosis and treatment. In addition, delays in aggregation and digitization of data (currently, between 30 and 60 days) impede Riders’ ability to make information-driven decisions.

Riders and DataKind explored new and powerful ways to produce digital data from handwritten forms in a near-real-time process – specifically using optical character recognition and intelligent character recognition – technologies that have been widely inaccessible to frontline health organizations. Our volunteers used computer vision techniques, existing character recognition APIs, and machine learning to develop a low-cost, open-access, open-source tool to automate key handwritten entries of common MOH sample transport forms. Through our design process, which included working directly with the Abuja-based Riders team, we’ve been able to design this prototype tool for the reality of data coverage in LMICs. The prototype has already proven it could speed up Riders’ data digitization process to under a day which means couriers could spend less time on data entry and more time transporting patient samples to labs, meaning more patients could be diagnosed earlier and begin treatments sooner. 


An overview of the Riders Optical Character Recognition (ROCR) tool 

Our first cohort focused on four widely-needed requirements in frontline health systems to improve the speed, quality, and appropriateness of care provision: building confidence in frontline health data, making healthcare delivery more proactive with predictive models, digitizing frontline health data, and improving FHW training. Each project is producing solutions to systemic challenges. These solutions will not only advance the missions of our individual project partners, but for other organizations in the space seeking to benefit from these outputs, a pathway to amplifying impact is open. Additional cohorts will allow DataKind to produce even greater impact for frontline health systems as successful solutions are transferred to more frontline health organizations, in more geographies, across additional health priorities. We’re partnering with more organizations in the coming months and will soon share insights and learnings from our work. 

We’re dedicated to applying data science and AI to improving global health outcomes. For anyone ready to commit their resources – including time, talent, funds, expertise – please contact us at ​​. If you haven’t already, sign up to stay in touch at, and stay tuned for volunteer opportunities, virtual events, special requests to our community, and more!

A Note About DataKind’s Support to Our Partners’ COVID-19 Response 

About halfway through the execution of this first cohort of projects, COVID-19 erupted across the globe. The immediate need for responding to COVID-19 was felt worldwide as FHWs became all the more critical in fighting the pandemic as they attempted to interrupt the spread and impact of the virus, maintain essential health services, and protect the most vulnerable. However, as all our partners have experienced, FHWs in LMICs have had their modes of work entirely disrupted. How can you battle disease when your social intervention, a person going door-to-door, is now a vector of transmission? 

At DataKind, we realize that many of the critical interventions needed don’t involve technology. However, overall, the three delays have been amplified and exacerbated across the system, and data science and AI can help to support health organizations with some real-time decision making and resource allocation challenges. The trust we’ve built with our partners through the Impact Practice so far has resulted in their proactive sharing of shifting priorities, changing needs, and evolving ways of providing project support. DataKind teams are scoping specific projects where data science and AI not only help in this fight, but also extend the reach of frontline health systems for at-scale health interventions for the future. Stayed tuned for more updates about these projects! 


Mitali Ayyangar leads DataKind’s portfolio of work in strengthening frontline health systems. She helps our partners and pro bono experts successfully develop and execute projects that not only catalyze each organization’s reach and impact, but also produce solutions that are useful and replicable across the global health sector.

1The “three delays” framework was initially developed by Thaddeus and Maine to examine factors contributing maternal deaths and has been widely used and modified to examine factors contributing to the broad spectrum of preventable deaths at the last mile.

Header image above courtesy of Riders for Health, photography by Tom Oldham

Quick Links

Scroll to Top