By Jake Porway, Founder & Executive Director of DataKind
A few years back, I got a call from my mother, apoplectic over a headline: “Target knows you’re pregnant.” Many of you probably remember this oft-shared story about how Target sent ads to a 16-year old girl hawking new baby gear, enraging her father until he learned that she was, in fact, pregnant. Target’s models didn’t “know” the girl was pregnant - they simply recommended products to her based on her past purchasing behavior - but the takeaway hit the public straight in the most fearful part of their gut.
The lesson was clear: evil companies can use evil data to learn private things about you in evil ways that they will use to do evil things. Data is evil! Algorithms are evil! Robots are going to tell your friends you’re pregnant before you do and that isn’t just evil, that’s just plain rude! Stupid evil, rude robots.
People still come up to me after I talk about all the social benefits of data science and the ways that algorithms can be used to help people get access to clean drinking water or improve public health outcomes and ask, with a self-satisfied smile, “yeah, but what about Target?” That distrust stymies our ability to use data and science where it could help people most.
Given the surprise ending to our recent election, I’m worried that we have a new “Target story” on our hands. With the ripple effects of the Target story looming so large even to this day, we need to be very frank and very clear about what this polling miss means for data, predictive modeling, and our own sense of certainty in numbers. Moreover, we need to talk about how data and community operate together for a healthy country.
Here are three things I think we in the DataKind community can do to help on that front:
1) Prove that Data Isn’t Dead
There has there been a lot of soul searching and finger pointing amidst the flaming remains of the many polling prognostications. I will admit to obsessively refreshing fivethirtyeight.com every minute for the last 90 days straight, searching to wrap myself in the succor that only a Nate-Silver-knit-quilt of polling data could bring. Like many, I was lulled into a false sense of security by the forecasts. I let myself believe that the data and the models were the ultimate authority and now stand with the rest of the country wondering how we could have gotten it so wrong and wishing I could get all that time constantly refreshing polls back for truly worthwhile pursuits, like constantly refreshing Twitter.
In pure apolitical defense of modeling, let’s just take a second to acknowledge that Hillary Clinton did win the popular vote by about the margin folks predicted and that Trump’s win was due to some razor thin margins in a few, sparsely polled states. But that’s not the point. The best rebuttals to the death of polling have already been written by others, most notably this WIRED article on the subject. The point is that we may face a moment where we need to repair the public’s faith in data. As statisticians, or even just data supporters, we must confront the fact that this polling fail threatens basic numeracy and it is our responsibility to restore trust in data and science at the public level.
There are very real changes that need to be made to polling and very valid privacy concerns about the Target example above, but the important thing to remember in both cases is that using data to learn about our world is a messy, imprecise, and uncertain act. Data is never truth and no model can represent reality 100% of the time. As the statistician George Box famously said, “all models are wrong, but some are useful.” Statisticians and scientists know this, which is why they can be so annoyingly circumspect about even the most rock-solid findings.
However, the most important thing is that we do NOT turn our back on data and science. Science is imprecise and it gets things wrong a lot of the time, but that does not mean that all data is “faked,” or that models are “rigged,” or that science is just fancy mythology with a bunch of smart-sounding words thrown in (although admittedly, “heteroskedasticity” does sound pretty smart). Far from it. Science is by far the best tool humans have in terms of finding repeatable patterns in our world in a fairly unbiased way. It’s just not perfect.
It is our duty as data scientists, analysts, statisticians, and technologists to help the public understand where our certainties do and do not lie as we do our work. It is on us to be extremely transparent about how we work, to explain our assumptions as we make them, and to be open and critical of our own work when it falls short. As we seek to bring data and algorithms into the world for social benefit, it is our duty to make sure our partners understand the limitations of this work while still maintaining faith in the practice itself. For the next few years, I’m sure we will be treated to the same smug retort “yeah, but what about the 2016 election forecasts?” If we do our jobs well now, that line will be the beginning of a conversation, not the end of one.
2) Break Down Boundaries, Build the Ecosystem
Those of us working to use technology and data for social change have long acknowledged the power of collective action and involving disparate communities to do so. Collective efforts are not just a nice-to-have in our work, they’re an absolute necessity. They’re the most critical component for creating respectful data solutions that get adopted, sustained, and are used ethically and capably. The advent of the internet, which has allowed many of us to cross the walls of our homes or our workplaces to collaborate, has shown us over the past decade that the models for creating a better world are moving away from monolithic, single-institution changemakers to networked ecosystems of socially aligned actors. It is an exciting development in our country and across the globe that allows all of us to play a part, even if just in our nights and weekends, in creating social change together.
I believe it is therefore all the more critical that those of us that work within community tech organizations continue to build connections between the public, the private sector, local government, and long-standing social institutions. We cannot make change with technology or data alone, we must come to the table with folks from very different backgrounds and very different viewpoints to work openly, ethically, and with care together to craft new solutions to age-old problems.
3) Think About People, First and Foremost
At DataKind, we think about people first, and data second. We will work even harder to create spaces within which all who believe predictive technologies have a role in the fight against climate change, have a role in the fight for better education, and have a role in the fight for social justice and equality can bring their big hearts and big brains to the table and get to work together. We’ll continue to embed our 6 values emblazoned on our walls - humility, thoughtfulness, world class expertise, transparency, approachability, and a commitment to diversity and equity - in everything that we do. We don’t know what the world will look like come January 2017 or beyond, but I do know that these principles will remain true no matter what.
Whether you are a programmer in a bustling tech-heavy US city, a social activist abroad, a foundation program officer, a university professor, a hedgefund manager, or just a concerned citizen, you have a role to play in using data for a better world. Join up with the paragons in this space like Code for America, Civic Hall, The New America Foundation, Data & Society, @DJ44, The Engine Room and many others who shout about this much more eloquently and work on these issues much more fervently than anyone ever could.
Come join us, join your local communities, join the many other vaunted organizations in this article, and let’s get to it. We know the world we want to see. Let’s get building.