The Microfinance Information Exchange (MIX) participated in our inaugural Datadive in NYC. The mission of MIX is to promote microfinance transparency through integrated performance information on microfinance institutions (MFIs), investors, networks and service providers associated with the industry. As part of providing this transparency MIX collects and disseminates considerable data related to global microfinance loans.
To do this, MIX scrapes as much publicly available data on microfinance as they can. Unfortunately, they do not have the resources as an organization to systematically collect and store this data. Enter the data scientists.
Unlike some of the projects presented by the other organizations participating in the Datadive, which focused more heavily on visualization and analysis, MIX's work focused on data acquisition. To collect microfinance data the group had to first identify the websites that would be their data sources, and then figure out how to go about scraping them. To get a sense of the variance among data sources the group tackled, here is a list of all of the website the team built scrapers for.
Once each scraper was built, the team used the ScraperWiki tool to automate the collection, and then handed the code over to MIX. All of this code is open-source and available through the team's Github account.
The work done over the weekend was hugely helpful in providing MIX with more access to more detailed data about microfinance institutions. Scott Gaul, the representative from MIX who worked on the team at our Datadive, had this to say about the results:
Our Africa team spent the past nine months finding and scrubbing these datasets to get them into a standardized format. In less than a day, a small team of developers at the Datadive was able to knock off a big portion of the same work, as well as some that we couldn’t tackle at all by hand.
Given the caveat that no one in MIX's team was working 24 hours a day on this problem as we did over the weekend, the point remains that working with data scientists for the weekend helped MIX complete an order of magnitude more work than they had previously.