DataKind Bangalore's Second DataDive

December 11, 2015

Guest post by DataKind Bangalore volunteer Sandeep Mertia 

DataKind Bangalore organised its second DataDive on 5-6 December, at ThoughtWorks Technologies. The event was centred on solving data problems for the four partner organisations – Centre for Budget and Governance Accountability (CBGA), Daksh, eGovernments Foundation and Rang De.

As a newbie volunteer at DataKind from Delhi, one was pleasantly surprised to see data enthusiasts turn up in big numbers on a weekend for a community driven two-day long sprint to solve data problems for social sector organisations and NGOs.

After a round of introduction to the organisations and their data problems, we divided ourselves into four teams – each dedicated to an organisation, based on the required skill sets. The teams included the people who were engaged with the organisation’s data problems from the previous data sprints. Also, we were joined by representatives from the four partner organisations –– who took active part in explaining the nature of the problems, expected solutions and pros and cons of different approaches.

Briefly the four data problems were –

  • CBGA – to extract budget data, including tables, from PDF budget documents, convert them in CSV format, then visualise the last ten years of union and state budget data to make the budget information accessible to a non-expert.
  • Daksh – organise and visualise legal data from all the courts released by the Supreme Court, High Courts and Government of India for communicating a better understanding of the judicial system
  • eGovernments Foundation – to make use of the last four years of data from the Chennai municipal corporation’s public grievance portal to predict trends and generate alerts at ward levels for better urban governance
  • Rang De – to create a recommendation engine for Rang De’s social investors based on the user and web analytics data collected over the years for better allocation of microcredits among their borrowers

All teams started by breaking down the data problem into sub-problems and tasks, and individual members or sub-teams took charge. As we proceeded, newer aspects of the problem emerged with time which led to a two pronged approach – a) establish a tangible, working solution during the DataDive, and b) create a roadmap for more long-term solutions.

For instance, on the eGovernments Foundation team, we discovered that a system for predicting an alert only when complaints spike will not be able to capture the importance of the complaints that somehow maintain a steady number throughout the year. We also realised that relying on a previous years’ ‘mean’ won’t work well since the number of users are expected to rise significantly in the future. This led us to create two different types of alerts – a) prediction of a spike of certain complaints based on comparisons of thresholds and b) trends analysis based on the cumulative sum of complaints. While we presented our basic prediction model at the end of the second day, we now have a long list of features to add to the solution.

Towards the end, as we were preparing for our presentation, the representative from eGovernment foundation exclaimed that he never thought so many things could be done with the data they have collected! Perhaps this was a good indicator of the progress we had made over the two days.

It was interesting to observe how a bunch of strangers at large were able to systematically brainstorm over a problem and divide time and resources to come up with solutions and ideas with potential social impact. In fact, our unfamiliarity with the problem and with each other led to a certain open-endedness that allowed for better learning and contextual adaptation.

In the midst of all this work, we were joined by the DataKind UK team via video conferencing to exchange notes on a similar DataDive being organised there. We were also addressed by Viral Shah, co-founder of Julia Computing Language on better possibilities in analytics work using Julia and by S. Anand, Chief Data Scientist at, on the dynamics of data-driven storytelling and visualisations.

Amidst all the code crunching and perspective sharing, some of us made sure we didn’t miss out on the fun! One cool thing I learned is that it takes N + 1 data scientists to change a lightbulb! N to change it and 1 to predict ‘N’. And, in case you haven’t already guessed, the final presentation from team Rang De was titled –– Basanti! Puns are a key part of any DataKind event.

Events like these mark the emergence of a new civics of data, which is simultaneously a global technology trend and a re-invention of many local, non-technical civil society movements in India. With generative events like DataDives for playful, community driven socio-technical engagements, the ‘Data for Good’ movement in India is moving into a new zone where ‘data’ seems to potentiate interfaces between experts, governments and civil society like never before.

Sandeep is an ICT engineer, ethnographer, and a Research Associate at The Sarai Programme, Centre for the Study of Developing Societies.