DataKind DC DataDive Projects

March 11, 2013

We’re buzzing with excitement over at DataKind about the 6 excellent projects we’re scoping out for our weekend DataDive with the World Bank, UNDP, QCRI, and others next weekend!  Each of these projects focuses on either using creative data techniques to alleviate poverty or to combat the prevalence of fraud and corruption in international development projects.  Granted, we’re not looking to fully solve these problems this weekend, but we’ve identified 6 projects that could make a huge difference.

We’ve listed each project’s goal, the data available, and the skills each could use.  Note that anyone with any skill set will be useful on any project (seriously!) so the skills section merely lists specific abilities that would be useful for each project.

THIS BEING THE BANK, PLEASE ADD DISCLAIMER SAYING THESE ARE ROUGH PROJECT DESCRIPTIONS AND THAT THE DIVE IS AN EXPERIMENTAL PILOT. PROJECT DESCRIPTIONS DO NOT NECESSARILY CORRESPOND TO BANK PRIORITIES OR REFLECT ON THE CURRENT WORK UNDERWAY IN THE RELEVANT THEMES.

Measuring Socioeconomic Indicators in Arabic Tweets   (Poverty)

Goal:
Determine whether socioeconomic indicators can be identified by observing conversations in Arabic on Twitter.  Examples include listening for poverty terms or human development phrases such as “no medicine”, “bankrupt”, or “bad education”.

Datasets Available:
• 10GB of Arabic tweets from 2/2012 on.
• An English to Arabic translation of key socioeconomic terms

Skills:
• Arabic fluency (translators may be available)
• Natural language processing
• Timeseries analysis
• Data processing skills (up to 10GB)

Combining and Analyzing the World Bank’s Project Data for 'Signals' (Fraud and Corruption)

Goal:
To combine all open information about a single World Bank project into one source to identify signals in the data. Questions include - ADD SPIEL ABOUT CLUSTERING CONTRACTORS ETC., MAY WANT TO CITE THIS EXAMPLEhttp://europeandcis.undp.org/blog/2013/01/31/big-data-and-development-organizations-what-happens-when-you-move-from-theory-to-practice/

Datasets Available:
• The open data on data.worldbank.org.
• An initial combination of some of the data provided by our Data Ambassador Taimur, who worked on the project during Open Data Day.

Skills:
• Data wrangling
• Exploratory analysis skills

Analyzing World Bank Supplier Profiles (Fraud + Corruption)

Goal:
To analyze detailed profiles of World Bank suppliers to better understand their relationships and identify potential for fraud in contracts.  Automated methods could be developed to, for example, identify companies whose phone numbers map to uninhabited regions or who share the same phone number / address with entities known to be high risk, or that bid together on multiple projects.