- Build tools that easily scrape and assemble data that can also be replicated.
- Identify key sources of data for MIX that will help them enhance microfinance across Africa.
- With the right squad of data scientists, a cumbersome process could be made pretty darn easy, relatively speaking.
Microfinance Information Exchange, aka MIX, is an organization that helps make microfinance possible by hunting down and disseminating reliable, trustworthy financial information to the key players in that world. In other words, data is its mission.
Microfinance institutions, funders, networks, and service providers depend on a source of transparent data on which they can make decisions. MIX is that source, but for some time they scraped data in a way that was less than efficient. And they knew it.
So MIX wondered: Which websites had the best data for the performance information they needed to provide? And how could MIX scrape those sites in a fast, effective manner that would make their information more exhaustive and responsive?
MIX’s Scott Gaul came to the DataDive in NYC in October 2011 because MIX didn’t have the resources to nail down a better way to scrape data from the wide range of sources of African microfinance data.
In a 24-hour blitz, a team of data rock stars, led by Max Shron, automated the grabbing of data from a slew of African websites found across the continent, from Kenya to Rwanda to Mozambique. The team built a scraper for each tool, automated collection using the WikiScraper tool, and, of course, did all this with open-source code so others could benefit, too.
Scott Gaul was mightily impressed: "Our Africa team spent the past nine months finding and scrubbing these datasets to get them into a standardized format. In less than a day, a small team of developers at the Datadive was able to knock off a big portion of the same work, as well as some that we couldn’t tackle at all by hand."
MIX’s knowledge base has improved drastically. They’re now in a position to move forward on their Africa project at a speed that simply wasn’t possible before.