One weekend, 72 data enthusiasts, three outstanding nonprofits, three interesting data challenges and much lip-smacking good food - these were the key ingredients to the awesome weekend we had for DataKind Bangalore's first ever DataDive March 21-22 2015.
We had an overwhelming response from data enthusiasts across the city and even outside Bangalore, including data scientists, developers, project managers, UI developers, economics graduates and even college students.
The three partner organizations were no strangers to DataKind Bangalore, having previously participated in a Project Accelerator Night back in January. Each organization came in to the weekend ready to build off their previous work and walked away with such significant progress made, all three are now considering doing long-term DataCorps projects.
Meet the organizations and see the great work now set in motion from this weekend marathon of work!
Founded in 2008, Digital Green is an international, nonprofit development organization that builds and deploys information and communication technology to amplify the effectiveness of development efforts to affect sustained social change. They have a series of educational videos of agricultural best practices to help farmers in villages succeed.
Digital Green's huge library of 4000+ video practices based on agriculture, health and hygiene; and nutrition are currently being screened in around 9000 villages with the help of their local partners. But, which videos are best suited for each village? Right now it's up to their local partners to decide, but this process could be automated with a recommendation engine that uses open data on local agricultural conditions to suggest the most relevant content.
Digital Green gave the team a selection of videos and descriptions, each focused on a specific crop. However, because each description was in a different regional language, the team had to parse and interpret this information in order to use it as a descriptive feature for the video. To add another challenge, they needed geodata with the geographical boundaries of different regions to map the videos to a region with specific soil types and environmental conditions, but the data didn’t exist.
The volunteers got to work preparing this dataset, ultimately publishing boundaries of 103,344 indian villages and geocoding 1062 Digital Green villages in Madhya Pradesh(MP) to 22 soil polygons. They then clustered MP districts into 5 agro-climatic clusters based on 179 feature vectors, mapping villages that Digital Green works with into these agro-climatic clusters. Finally - one of the highlights from the weekend - the team developed a Hinglish parser that parses Hindi titles of available videos and translates them to English to help the recommender system understand which crop the videos relate to.
Digital Green is now exploring a long-term DataCorps project to finish the recommendation engine and incorporate it in to their existing system.
Janaagraha was established in 2001 as a nonprofit that aims to combine the efforts of the government and citizens to ensure better quality of life in cities by improving urban infrastructure, services and civic engagement. Their civic portal, IChangeMyCity promotes civic action at a neighborhood level by enabling citizens to report a complaint that then gets upvoted by the community and flagged for government officials to take action. The site began in Bangalore and will expand to Mumbai, Chennai, Delhi, and Bhubaneswar this year.
Citizens can report on a variety of issues including garbage collection, poor street lighting, potholes, security or even allocation of city funds, but duplicate complaints can clog the system. Janaagraha also wanted to understand what things affected open issues from being closed out.
Every complaint on IChangeMyCity has certain meta-data associated with it including Category, Sub-Category, Title, Description, Geocodes, Address, Complainant details etc. During the DataDive, the DataKind team came up with scripts that compare two complaints and predict the probability of duplication with 90% accuracy.
To figure out what factors were preventing issues from being closed, the team used two approaches. The first involved analysis using decision trees by capturing attributes like comments, vote-ups, agency ID, subcategory and so on. The second approach involved logistic regression to predict closure probability, which was modeled as a function of complaint subcategory, ward, comment velocity, vote-ups and similar other factors.
Janaagraha is now exploring a long-term DataCorps project to continue this work. They are hoping to develop a real-time solution that can be plugged into the IChangeMyCity platform to identify duplicates on the fly and combine them so officials see them as one issue affecting multiple citizens. The team will also identify patterns in the complaints in terms of level of severity and corresponding levels of potential impact that would result from its resolution.
Teach for India started out in 2009 as a nationwide movement of outstanding college graduates and young professionals committing two years to teach full-time at under resourced schools and working towards the pursuit of equity in education. Teach for India is active in seven cities, with a network of 910 fellows and 660 alumni.
Even though the number of staff leaving might seem small, it is often tough to get qualified and committed staff members on board for a long period. Teach for India wanted to understand the attrition rate of their operational staff and build early indicators to prevent staff turnover.
The team had access to the exit survey results of people who had left Teach For India as well as other HR data of existing and former staff. The team first performed exploratory analysis around the data to get a better understanding of which attributes to look into and which model would be best suited for it. The team performed clustering around the data under various related features that might be contributing to the attrition rate among staff and then created a decision tree model to predict the probability of someone leaving the organization given their survey responses and other attributes like age, educational background, position or city.
Teach for India is now exploring a long-term DataCorps project with DataKind Bangalore on a new topic - seeing how data science can inform their international fundraising strategy.
At the end of two days, the teams presented results in front of all the participants, got some amazing feedback from everyone and ended the day on a high note with some delicious tea, coffee and bondas.
These initial findings are hugely promising, especially when you consider that the DataDive was an important steps in a larger journey for these organizations using data science to transform their work.
See where their journeys head next and stay tuned for more updates as the next phase of work begins.
Finally, if you're local - please join us! We'd love to see you at our next event.