Meetup Recap: Untangling Ethical Questions in Data Science

July 25, 2014

Our recent Meetup on the Ethical Responsibilities in Data Science, kindly hosted by Pivotal, was a whirlwind of great examples from expert panelists and thoughtful questions from audience members on the ethical dilemmas we face when doing data science projects for good.  

While it's hard to summarize all the great questions that surfaced, here are our top three:

  • How can we create ethical data products?
  • How can we dismantle the data science "ivory tower" and better engage stakeholders?
  • Is it time to start crafting the “Hippocratic oath” for data science?

Panel discussions like these are only as good as the discussion they inspire and let’s just say the nearly 70 attendees and panelists were an inspiring bunch.

First, the panelists – from left to right:

  • Jake Porway, DataKind Founder and Executive Director:   Moderator

  • Tim Rich, Data Scientist at 1stDibs.comTim served as a Data Ambassador on a DataDive project for the NYC Mayor's Office of Data Analytics to revamp MODA's geocoding application so that every New Yorker gets counted during vital analyses.  

  • Samir Goswami, Director of Government Professional Solutions at LexisNexis:  Samir manages a team at LexisNexis that provides data to the federal government for various big data applications.  He was also the Amnesty International USA representative on their DataDive project with us last fall to see if analytics could be used to predict human rights violations.

  • E.V. Wright, Research Associate and Project Manager:  As a volunteer with DataKind's Amnesty International project, E.V. brings her perspective as an artist and human centered research practitioner to remind us that first and foremost data are people.

  • Bob Filbin, Chief Data Scientist at Crisis Text Line:  Crisis Text Line (CTL) serves young people in any type of crisis, providing them access to free, 24/7, emotional support and information they need via the medium they already use and trust: text. Bob works with the incredibly sensitive data collected during these interactions to help CTL counselors better serve their community and provide insights into teen mental health care and counseling.  

Now, the attendees - from left to right:

Just kidding!  Fortunately, we don’t have personally identifiable information on everyone in the room, but we were pleased to see so many familiar faces! 


How can we create ethical data products?

Think Facebook’s News Feed or LinkedIn’s job suggestions.

Tim suggested that products have, say, a sunset clause that would require all data collected to be wiped after two years.  However, Samir felt that just deleting the data after two years wouldn't go far enough because it's almost impossible to separate the data product from the process. For example, during the Bahraini uprising, the government was using Facebook to find protesters while the protesters were simultaneously using Facebook to organize.  Is the ethical issue with the product, Facebook, or the algorithm? 

Jake added that perhaps the key to an ethical data product is ensuring that there’s a clear information economy:  users know exactly what their data could be used for when they’re giving it up.  Bob phrased this even better by asking one simple question that guided them at Crisis Text Line: “Would our users be surprised by how we’re using their data?”  In a world of Facebook experiments and Target predictions that erode users' trust, this line rang true as perhaps one of the most important perspectives as we move forward to build new data products.


Dismantling the Data Science Ivory Tower

One audience member asked, "can data scientists simply hand over a tool that works or is it just as important, if not required, to explain how the tool works?"  E.V. Wright brought up the importance of keeping the end user in mind, especially when they have a different level of tech and data literacy.  An audience member added that data scientists must recognize they have an elite set of knowledge and tools that can have huge ramifications on society as a whole.  This means they must be careful not to make decisions for others without their consent, especially when they are creating something new.  When decisions are being made that could affect other people in large scale, data scientists must somehow communicate what they are doing.  The solution?  Involve stakeholders at all stages of the data science process and clearly explain what you’re doing.  The biggest mistake a data scientist can make is not to involve the people that will ultimately be affected by their work.


The Hippocratic Oath for Data Science

Tim Rich closed the discussion beautifully with a call to action for us all: “My big beef with discussions on ethics is it ends at the door.  Because we are all engaged in data science as our trade, writing down ethical statements is what needs to happen next.  We can talk all day long but we need to start codifying and we need to work together by writing it down.”


While we almost certainly ended the evening with more questions than before, everyone at least left with more insight into the ethical challenges at play in our work.  For further reading, Jake recommends the book, Raw Data is an Oxymoron, and we also love the Responsible Data Forum series, Data & Society as well as Data Science for Social Good fellows blog series for more stories from the forefront of data-for-good!

What questions do you have?  Where do you agree or disagree?  Help us continue this discussion beyond the doorway by leaving your comments below or give us a shout @DataKind!