The energy was palpable as over 70 data do-gooders poured in after work or classes the Friday of our inaugural DataDive, April 24-26. (Special thanks to Fuji Xerox Singapore for being our venue sponsor and donating the perfect space for the weekend!)
Of all the findings uncovered during the weekend, we're especially happy to report that, contrary to stereotypes, geeks are indeed highly social creatures. Everyone had a good time getting to know new people while contributing to the Data-for-Good movement and helping two incredible organizations - Earth Hour and Humanitarian Organisation for Migration Economics (HOME) - use data science to advance their missions.
R or Python?
We kicked off the weekend with an introduction to DataKind and a description of what a DataDive is (hint: not just a hackathon!).
Next, Richard Brock from Earth Hour took to the stage to explain their mission to raise awareness about climate change and inspire people across the world to take action for the planet. The eager crowd of volunteers was presented with the challenge of identifying key influencers in the Earth Hour supporter base, as well as understanding what their supporters care about. From a data perspective, the big draw of the Earth Hour project was the opportunity to work with large amounts of social media data and delve into network analysis.
We then heard from Jolovan Wham, the executive director of the Humanitarian Organisation for Migration Economics (HOME). He described how his organisation serves low-wage migrant workers in Singapore by providing social services such as shelter and legal aid to workers in need. His appeal for specific insight into the issues faced by migrant workers who approach HOME for help, ranging from abuse to financial disputes, touched many of those present.
Motivated by (more than) the promise of breakfast, volunteers streamed in bright and early on Saturday morning. Project leads rallied their troops with calls of “R or Python?” echoing across both rooms. Non-coders were not left out either, with plenty of opportunities to help out in other ways. After a more in-depth explanation of the data and some initial ideas to get the creative juices flowing, it was finally time to whip out those laptops and start diving in! Interestingly, the mood in each room seemed to reflect the nature of the data that each group was working on — there was a constant buzz of conversation coming from the group working with social media data while the atmosphere was more serious in the group working to uncover trends in mistreatment of migrant workers.
As the world’s largest grassroots movement for the environment, Earth Hour is about people and their potential to change climate change. The insights we gained from the DataDive highlight the greater impact we can have by becoming a data-driven organization.
- Sid Das, Executive Director, Earth Hour Global
Even before the DataDive started, around a dozen core volunteers were heavily involved in collecting and preparing Earth Hour’s data, which included Twitter data and ActiveCampaign data.
The Twitter data included:
- thousands of tweets from Earth Hour chapters in the past year,
- hundreds of thousands of tweets with relevant hashtags in the lead up to this year's Earth Hour on March 28,
- Twitter profiles of over 140,000 direct followers of Earth Hour Global, and
- Twitter profiles of over 40 million second-degree followers of Earth Hour Global's Twitter account.
ActiveCampaign is Earth Hour’s email marketing platform, and data pulled from their API included the demographic and activity data of hundreds of thousands of subscribers.
With these rich sets of data, DataKind volunteers were able to unlock valuable insights about Earth Hour’s supporters. A small sample of these insights include:
- After wrangling the country of each subscriber from a free-text field, the top supporters from each country could be identified so that Earth Hour can reach out to these supporters with targeted messaging for further engagement.
- The Tweets and emails that most resonated with supporters were identified and characterized to help shape messaging for future campaigns to achieve maximum impact.
- A sentiment analysis of tweets containing key hashtags was done on a global level. In the follow-up to the DataDive, this will be repeated for Twitter followers in a list of countries in order to enable country-specific messaging in future Earth Hour campaigns.
Finally, an interactive visualization showing the distribution of Twitter followers was done using WebGL Globe. We’ll be working with Earth Hour to refine this and make it available on their website, as a visual representation of Earth Hour’s reach as a global movement.
Play with this at http://datakind-sg.github.io/
The analysis of the data provided tangible findings that we can take to the Ministry of Manpower to work with them to ensure that migrant workers in Singapore are treated fairly.
- Jolovan Wham, Executive Director, HOME
HOME has accumulated a rich database of cases of all migrant workers who have gone to them for help in the last five years. With over 10,000 cases, the data included demographic details about migrant workers, what types of abuse (if any) they suffered, their employment agency, their employer, whether they had stayed in HOME’s shelter, and data concerning their working and living conditions. In total, there were over 150 possible fields for each case record.
With so much sensitive and confidential data, extensive work had to be done by the Data Ambassadors before the data could be released to the DataDive participants. In addition, many of the fields were free text, but our enthusiastic army of volunteers were up for this challenge. Rates of pay were normalized to be per hour from possible inputs of per day, per week or per month. Employment agency names were standardized by screening out stop words, checking Jaro-Winkler distances, and manual checks by HOME volunteers and on and on.
An heroic effort was also undertaken to map addresses to postal code areas. Many of the addresses in the database were missing postal codes and the standard Google Maps API can be inaccurate for the further-flung areas of Singapore (a whole 20km from the city center!). A small team was able to use the results from three different map APIs to better determine the postal code area for these ambiguous addresses in order to visualise the geographic distribution of cases within Singapore.
HOME’s specific questions ranged from simple breakdowns of workers’ demographics, salary ranges and nationalities, to identifying correlations between variables such as the types of issues a given migrant worker faced and his or her nationality. Not only were most of the questions addressed during the DataDive, a proof-of-concept, browser-based dashboard was developed. The dashboard is built on top of dc.js which is in turn built on D3.js and crossfilter.js, allowing the user to filter different variables in an interactive way with immediate responses from the visualizations. To wrap up the effort from the DataDive, a small group of volunteers from DataKind SG are working to deploy this for HOME to use on their own. As the saying goes, “Give a man a fish and you feed him for a day; teach a man to fish and you feed him for a lifetime.”
Last Ones Standing
I was really impressed by the atmosphere that was created and also the number of people who stuck it out for the whole weekend - fantastic!
- Emily Perkin, founder of Just Cause
What an amazing weekend! From the introductions on Friday evening, working all day on Saturday, and scrambling on Sunday morning for the final presentations at noon, it was an exhausting but fulfilling weekend. Talented and motivated data do-gooders came together from all over Singapore (and even from Malaysia!) to join forces and answer the call of DataKind: to tackle humanity’s biggest problems through data science.
Join us! Sign up to get involved with DataKind Singpore and we'll see you at the next event!