Microfinance Data Scraping from MIX
May 24, 2012


When DataKind held our first DataDive last October in NYC, we had the pleasure of working with MIX as one of our non-profit partners.  Volunteer data scientists and MIX staff worked together to dig into financial service data in Africa using web scrapers and found a wealth of helpful information. We're thrilled to share a guest post from our partners at MIX about the entire process.  Enjoy!

The poor have complex and sophisticated financial lives, but we lack data on many of the services that they use to meet basic needs. Even very basic data can yield insights on how to expand services or whether providers are targeting the right needs.

A wealth of data on financial institutions is publicly available, but is ‘hidden’ in different places and  ‘locked up’ in hard-to-use formats. If we want to build a realistic picture of the financial options available to the poor, we have to look for ways to find and unlock this data.

There are two basic ways that have been used to measure the landscape of financial services in the past:

    • Send out a questionnaire: this is easy for the surveyor, but a burden on the respondents. It’s  hard to get much breadth or depth without strong incentives (or a mandate) to report.

    • Hire surveyors on the ground: this may be easier for the respondents, but is costly and time consuming and also hard to repeat.

We took a third path: web scraping. A web scraper is a computer program that visits a website, finds the particular data of interest, and saves the data in a more structured format. Web scrapers can extract data from existing websites without placing any burden on respondents or surveyors. Additionally, the scraper scripts can be shared and repeated and improved by others.

Financial institutions need to make good information available for their customers, such as through online branch listings, but to date no one has used these readily available, public sources of data for mapping. Could web scraping help unlock some of this data?

MIX participated in the first DataDive organized by DataKind in order to find out. A fantastic team of data scientists set up several web scrapers over the course of a weekend, work that would have taken the MIX team many many more hours to do manually.

To take this further, MIX and Thomas Levine then collaborated to map the complete financial sector in three countries in Africa, with maps and visualization by Development Seed. The end result is data on over 60,000 points of service in South Africa, Kenya and Rwanda (both coming soon), geo-coded to individual towns and mapped and consolidated for easy access. You can read more about the first round of data on South Africa here or here.

We started with a list of public data sets for each country, generally either lists of branches or mobile banking agents or databases from regulators or networks. Tom then created scrapers for each and housed them publicly on ScraperWiki, with instructions for use and maintenance.

Each scraper had a standard output so that data from different scrapers could be combined; these data are accessible for each country as a Fusion Table or CSV from the website. Once the data were consolidated, the results were geo-coded (with help from others) so they could be plotted on a map for easy navigation and access to data.

Scrapers have some good properties relative to surveys and questionnaires. The data on the websites are subject to review by customers of the various financial institutions, who need to know where their local branch or office is. The scrapers themselves can be subject to peer review; by using and publishing the computer scripts, we show our work. The scrapers can also be run multiple times (if the site doesn’t change) meaning that we can re-generate the data on demand and that we can track changes in the data over time.

Scrapers are not a panacea for all hard-to-reach data. We ran into challenges when institutions had poor or messy data on locations on their websites. Some more ‘grassroots’ providers don’t have a strong online presence and MIX had to find data offline. However, we always had recourse to more traditional data collection methods when we encountered such issues. Using the existing public data now can also reinforce the need for improvements the long run.

What do these data tell us? The biggest (somewhat unexpected) result was data on the ‘invisible market’ of financial services providers beyond banks and MFIs. The National Credit Regulator in South Africa covers not just banks and MFIs, but also payday loans, vehicle finance, retail credit - anyone that provides credit in the country. Not only did their database yield more than 10X the coverage of other major public databases, it also helped to shed light outside the streetlight. In South Africa, we can now see that cash loan shops (akin to payday lenders) are most prevalent in high-poverty areas, and that developmental providers like MFIs are just a drop in the bucket for the sector overall. In Kenya, we’re learning that there are more than 17 times as many M-Pesa mobile banking agents as there are bank branches in the entire country. In Rwanda, cooperatives seem to be the most prevalent outside the main urban centers.

These data would have taken months or years to collect by hand, but web scraping unlocked it within a matter of weeks and gives us a path to keeping this up-to-date in the future. Take some time to explore the data and scrapers to see what else we can learn.


Read more posts
January 11, 2022
Our Ethics + Responsible Data Science Practices at DataKind
At DataKind, we take an expansive definition of data ethics and responsible data science as broad terms that can be used to describe the appropriate handling of data...
Read full story
December 21, 2021
Lessons from DataKind San Francisco’s Launch of DataAdvisory Projects
From financial forecasting to targeted advertisements, advancements in data collection and analysis have benefited a myriad of for-profit organizations today.
Read full story
October 14, 2021
Celebrating DataKind’s CEO: An Interview with Lauren Woodman
We’re thrilled to welcome Lauren Woodman as the new CEO of DataKind. She brings to the role over 25 years of experience working at the intersection of technology, development, policy, and NGOs...
Read full story
December 20, 2021
Shining a Light on Community: Looking Back at DataKind’s Virtual DataDive® Event
We hosted a DataDive® event in fall 2021, and with it being the season of giving, we thought what better time to share some highlights and express our deepest gratitude to our partners, volunteers, and sponsors...
Read full story
Blog Archive