Report Back from DataKind DC’s Sixth DataDive

June 26, 2017

DataKind DC hosted its sixth DataDive this past March, partnering with four organizations—  Kiva USA, Global Financial Integrity (GFI), Freedom House, and Catholic Charities USA— to leverage data science to advance their missions. Over the course of the weekend, teams of volunteers identified valuable insights from the data. Volunteers have even continued working with Freedom House and Catholic Charities USA beyond the DataDive to develop valuable tools for each organization. Check out highlights from the weekend and learn about how you can get involved with DataKind DC below!

 

Kiva USA 

Kiva works to alleviate poverty byallowing donors to lend money through interest-free loans to low-income small business owners. The team, led by Data Ambassadors Jonathan Joa, Rachel Wells, Rajit Kavindran and Kyle Ogilvie, analyzed Kiva’s data to explore what types of loans and borrowers are prone to partial loan repayments. They found delinquency rates are highest in the two to four payment period, and that the first 90 days are sufficient to identify defaulting accounts (as shown below).

The team also sought to predict what loans are at risk of low repayment and optimize loan amount offered in order to to maximize repayment likelihood and social impact. For example, they found that businesses with eBay accounts were more likely to default.

All insights were shared in a final report with the Kiva USA partner representative. The findings reinforced a lot of the trends Kiva USA was already seeing as well as helped them think differently for how they might intervene with borrowers and ultimately support even more small businesses.

 

Global Financial Integrity

Global Financial Integrity (GFI) is working to curtail illicit financial flows by providing better advice to and advocacy for affected countries.  They wanted to collaborate with DataKind DC volunteers in analyzing price anomalies in trade transactions (the most common way to siphon illicit capital out of an economy). The challenge during the DataDive was to identify when goods entering and leaving developing countries were mis-invoiced. While GFI had developed tools for customs officials to determine if a given shipment was an outlier, they had not yet mined the data to look for specific trends. Led by Data Ambassadors Andrew Brooks, Margaret Furr, Minh Mai, and William Ratcliff, the DataKind DC divers assembled a list of the 20 most anomalous commodity transactions. In order for GFI to help customs departments collect more revenue and alleviate poverty, the future goal is to use more sophisticated techniques for outlier detection and expand the scope of countries.  Below, we show some examples of the outliers we found.

Finding #1: Identifying Outliers in Ethiopian Coffee Exports

 

In this figure, we show the monthly variation in the price of Ethiopian coffee. In red, we show exports from Ethiopia and in blue, we show imports. In green, we highlight outliers.  

Top Outliers

 

  • Polyether Alchohols
  • Article of Vulcanised Rubber
  • Uncoated Paper and Paperboard
  • Pallets and Pallet Collars
  • Paper and Paperboard Used for Writing
  • Heat Pumps
  • Agricultural or Horticultural Watering Applicances
  • Glazed Flags and Paving, Hearth or Wall Tiles and Mosaic Cubes
  • Beer made from Malt
  • Phosphates of Calcium
  • Rafts, Tanks, Coffer-Dams, Landing Stages, Buoys, Beacons and Other Floating Structures
  • Float Glass and Surface Ground and Polished Glass
  • Electric Conductors for a Voltage
  • Flat Rolled Products of Stainless Steel       
  • "Sheets of Iron or Non-Alloy Steel, Cold-Formed or Cold Finished, Profiled, 'Rissed'"
  • Lamp Holders

 

  • Building Bricks
  • Fresh or Chilled Bonless Cutls of Fowls of the Species Gallus Domesticus
  • Tower Cranes
  • "Digital Versatile Discs 'DVD'"
  • Test Benches for Motors, Generator, Pumps etc
  • Waste and Scrap of Stainless Steel
  • Salt, Denatured IR for other Industrial Uses, INCL REfining
  • Zinc Waste and Scrap
  • Groundnut, Cotton-Seed, Soya-Bean or Sunflower-Seed Oil and their fractions
  • Parts and Accessories for Apparatus and Equipment for Photographic or Cinematographic
  • Waste and Scrap of Copper Alloys
  • Soya-based Beverages
  • Molluscs, fit for Human Consumption
  • Lactose in Solid Form and Lactose Syrup
  • Portland Cement
  • Slaked Lime   


Finding #2: High Variance in Unit Price of Coffee By Importing Country

In the top panel, we show how the unit price of coffee exported from Ethiopia varies depending on the importing country. We present similar results in the middle and bottom panels for exports of coffee from Ghana and Kenya respectively.

 

Freedom House

An interactive tool built by DataKind DC volunteers that enables Freedom House to select multiple variables to be displayed simultaneously, allowing better understanding of data on civil liberties and political rights around the world.

Freedom House (FH) produces an annual the Freedom in the World (FIW) index, which is a yearly survey and report that measures the degree of civil liberties and political rights around the world. This work is a valuable source for activists and organizations that defend human rights and promote democratic change. Each country’s overall score is available on their website, but the 30 sub-indicators underlying each country’s total score in the FIW index are not as easily accessible or usable. At the DataDive, FH worked with DataKind to help organize and visualize the FIW index and sub-indicator data to make the historical data easier to access, review, and compare, in order to inform strategic planning for their programs and to help human rights organizations make better use of this data. Led by Alex Spancake, Chloe Gordon, and Arati Krishnamoorthy, volunteers used Bokeh, a Python interactive visualization library, to produce an interactive visualization shown above that incorporates the sub-indicator data provided by Freedom House, as well as five additional sources, such as the World Bank. Users can select from an array of economic, demographic, and freedom variables to analyze relationships across countries, regions, and over time. The tool will be used to compare countries’ scores and indicators in order to put pressure on governments to improve governance, rule of law, and human rights.

Since the DataDive, the team has continued to work with Freedom House to post the tool on Freedom House's website, increase interactivity of their current D3 map, assess the predictive power of different variables on freedom scores, and automate the data preparation and collection process.

 

Catholic Charities USA

Catholic Charities USA (CCUSA) provides disaster assistance to individuals and families before, during, and after a tragedy hits. They provide the holistic compassionate care that helps individuals recover and move forward. CCUSA wanted to create a map to better target mitigation, preparedness, relief, and recovery projects in order to best serve communities that are both at greatest risk for disasters, are most overlooked, or outright excluded from federal assistance during disasters. The team, led by Rich Carder and Jake Snyder, used the CDC's Social Vulnerability Index and a proprietary natural disaster dataset (generously donated by ATTOM) to develop this map using Mapbox, R, and D3.js. Volunteers created the alpha version of the Disaster Operations map during the DataDive, and it continues to develop and improve as an open source tool.

 

Working version of the CCUSA Disaster Operations Map.

During the DataDive, we were lucky to have a small but incredibly talented team of volunteers, some of whom have remained on board in the months since. In addition, the project benefitted from volunteers at Code for DC hack nights, making this a truly collaborative project with input from our amazing DC volunteer community!

 
CCUSA volunteers working at the DataDive: Lukas Martinelli, Adil Yalcin, Rich Carder, Jake Snyder, Alex Wasson 

The team is using GitHub to collaborate on further developments. If you are interested in contributing or learning more, the source code is located at this GitHub page.

 

Thank Yous 

Thank you to our DataDive host, Social Tables, for continuing to host us in their beautiful space. A huge thank you to our Data Ambassadors and volunteer teams that donated their time and talent to help four project partners use data to improve the world. 

Join Us!

We would love to see you at the next DataDive or Meetup! Join our Meetup group or follow us on Twitter for the latest news of how you can get involved.

Comments

Comments