Post by Miriam Young, Communications Specialist
Originally appeared on the NTEN blog September 18 2014
As is customary in important articles on data, let us begin with a review of the Beverly Hillbillies theme song:
Come and listen to a story about a man named Jed,
A poor mountaineer, barely kept his family fed,
Then one day he was shootin at some food,
And up through the ground came a bubblin crude.
Data that is, black gold, Texas tea.
Ah data, that “black gold” that is now everywhere. In fact, many have pronounced that “data is the new oil,” yet we’re still struggling with how to harness its potential. Worse still, many people are struggling just to understand what it really means.
Prior to smartphones and Internet, data came in specifically collected, neatly controlled bundles, usually from scientists hunting down information for an experiment, governments collecting information for forecasting, or academics looking to understand our world. Moreover, data was often thought of as forensic, a dull medium for reporting to your funders or constituents.
Over the last few years, however, as our day-to-day interactions have moved increasingly from the physical world to the digital, we generate tons of data with every action we take. Data pours off our cell phones and websites. It bubbles into geysers with every item we buy online or every credit card transaction we make. It pours onto servers with greater volume and at greater speeds than ever before, giving rise to that so-called ‘big data’ you’ve heard about. All this seemingly ancillary information could be used to make more informed decisions, react more quickly to changes, and better understand our environments. However, while we are surrounded by more data than ever before, we need a way to extract meaning from all this bubblin crude. Enter data science!
Never heard the term “data science” before? You actually interact with it every day. Whether it’s Netflix poring over mounds of movie ratings to recommend House of Cards for your weekend binge, Gmail using historical data to flag that email from your girlfriend as high priority, or LinkedIn analyzing heaps of relationship data to send you the perfect job opportunity, data science is routinely used to provide new services that make our lives easier. Data science is the art of wrangling data to do things like predict our future behavior, uncover patterns to help us prioritize, or provide actionable information with the swipe of a screen.
Many companies are using data science to make money and improve their products and services. But without a data scientist on staff or a LinkedIn-sized budget, how can your organization take advantage of this data revolution to transform your work?
DataKind was founded in 2011 to help mission-driven organizations team up with pro bono data scientists to help you extract valuable information from untapped sources of data. While there’s still debate about defining this profession, we think of a data scientist as both a statistician and a computer scientist. This combination of skills puts data scientists in a sweet spot of knowing not only how to obtain and manipulate the necessary data for your organization, but how to understand what the numbers are — and are not — saying.
We’re especially excited about how data science could be used for more than just helping you find a good show to watch. The same algorithms and techniques that companies use to boost profits can be leveraged by nonprofits to further their missions, from battling hunger to advocating for child well-being and more.
For example, our volunteers worked with Amnesty International to see how 25 years of data from their Urgent Action Network could be used to predict future human rights violations. The Urgent Action Network is a powerful email alert system that mobilizes supporters to respond to human rights threats unfolding across the globe. As a result, Amnesty International had gigabytes of raw text documents they had digitized that were just waiting to be explored. We matched them with a team of over 20 DataKind volunteers who, over the course of a weekend DataDive, organized, visualized, and explored their massive dataset. The volunteers manually identified the "critical" Urgent Actions that had ultimately escalated, then used data mining techniques to uncover patterns that might predict others. The volunteers were able to develop a preliminary predictive model for Amnesty International to identify high-risk situations and prioritize potential threats, enabling them to better coordinate efforts and do what they do best — save lives.
“But wait!” you may say. “MY organization doesn’t have that much data, so clearly data science doesn’t apply to me.” There may be more data available to you than you think. If you have a website or a mobile app, you could be sitting on a mountain of data showing how your constituents are using your services. If your staff is doing tedious tasks like copying and pasting data from a city website, new data collection techniques could automatically gather it for you. Even if you don’t collect a drop of data yourself, publicly available government data like the U.S. Census or social media data from Twitter and Facebook could inform your work or help you better advocate for your stakeholders. We are now so awash in data, it’s difficult to find an organization that wouldn’t benefit from data science in some way.
While DataKind specializes in helping organizations think through data challenges like these, there are a number of ways to get started on your own:
If you actually have a data project in mind, remember there are data science professionals that would love to volunteer to help! At DataKind, we are always looking for new partner organizations to collaborate with or you can see if a local university program exists near you like the Data Science for Social Good Fellowship in Chicago.
How you begin your data science adventure doesn’t matter as much as just getting started. Whether you work with DataKind or get rolling on your own, the best thing to do is to jump in and start learning. Only then will you be able to turn all that “crude oil” of data into a true engine for change.