Illustration above by iStock/drogatnev
By Benjamin Kinsella, Technical Project Manager, DataKind
Written texts are everywhere, from documents and notes to surveys and forms. While every organization has access to these rich sources of information, they can also be overwhelming. For example, how can a resource-constrained mission-driven nonprofit derive value from the mountain of text data living in websites, financial reports, or news reports? Or when fieldworkers for an international NGO record observations, managers may notice similar themes or problems. But how can organizations tell if these findings are by chance or if there are more structural patterns across the documents? This is where the power of machine learning and AI can help.
At DataKind, we’re committed to using data science and AI ethically and capably in the service of humanity. We derisk a space through our projects, and help social change organizations appreciate that data is much more than just a resource for reporting and measuring. By demystifying “data science” through our projects and publications, we hope to inspire more social actors to consider embracing new technologies. We’ve seen firsthand how analytics can empower mission-driven organizations using techniques like predictive modeling, geospatial analysis, or Natural Language Processing (NLP). To amplify these learnings, we recently published an article, “How the Social Sector Can Use Natural Language Processing”, in the Stanford Social Innovation Review (SSIR). We present potential use cases of NLP, in addition to six mature and accessible techniques that any social sector leader could take up to address real challenges.
To contextualize the piece, it’s based on a DataCorps project with the talented DataKind team of pro bono data scientists, Sarah Eltinge, Matthew Harris, Jared McDonald, and John Winter. The team members used NLP on a dataset consisting of 28,000 bills from the past ten years signed into law in five US states. These are the kinds of texts that might interest a social sector organization, such as an advocacy group or think tank. They’re publicly available, and large and varied enough, which would most likely challenge any human analyst attempting to manually read millions of words.
Figure 1: Latent Dirichlet Allocation Model on State Policy Dataset with Five Topics
Figure 2: Example Demo Slide on Use Cases of Part-of-Speech Tagging
Furthermore, the team developed illustrative online demos. In these demos, the team elucidates these NLP techniques by demonstrating technical examples and providing sample code and access to the project data. For instance, they answer questions like: How and why one would conduct NLP in the social sector? What’s the purpose of text pre-processing? And how do techniques like part-of-speech tagging, entity recognition, topic modeling, among others, work?
Through step-by-step instructions, and accompanying code, the team’s online demos are meant to be much more than just a primer on NLP. Rather, they’re a toolbox that can inspire and mobilize a social actor to begin thinking about the ways their organization can begin using text analytics. Referencing this article and online demos as a point of departure, any organization can begin putting NLP techniques to use; they should start with a well-posed problem statement, the right data and people, as well as a careful anticipation of possible unintended consequences.
This project and the work done by the team of highly skilled volunteers demonstrate how NLP techniques, even simple ones, can be used to unlock hidden insights and solve real-world challenges in the social sector. It’s our hope that social sector leaders and funders will find the SSIR article valuable and experiment with NLP in their own work.
Finally, we’d welcome your comments on the article. Did you find it informative? Did you share it with anyone? If so, who? And if you would be interested in more articles which focus on the social applications of specific data science or AI techniques, please do let us know which ones!
Benjamin Kinsella, PhD, is a project manager at DataKind, assisting in the design and execution of pro bono data science projects. He’s also a former DataKind volunteer, where he applied NLP techniques to answer socially impactful questions using text data. Benjamin holds a doctorate from Rutgers University - New Brunswick.
Support for this project was provided by the Robert Wood Johnson Foundation. The views expressed here do not necessarily reflect the views of the Foundation.