By Rachel Wells, Senior Manager, Center of Excellence, DataKind
At DataKind, we take an expansive definition of data ethics and responsible data science as broad terms that can be used to describe the appropriate handling of data, use and performance of models, inclusion of stakeholders, staffing of teams, and more. AI ethics researchers Luciano Floridi and Mariarosaria Taddeo define data ethics as:
“A new branch of ethics that studies and evaluates moral problems related to data (including generation, recording, curation, processing, dissemination, sharing and use), algorithms (including artificial intelligence, artificial agents, machine learning and robots) and corresponding practices (including responsible innovation, programming, hacking and professional codes), in order to formulate and support morally good solutions (e.g. right conducts or right values).”
This definition is a helpful guide for us because it includes not just data and algorithms but also the human practices that happen around and alongside using data and building models. Ethical data science approaches include designing with equity, data inclusion and exclusion, data provenance, data privacy and security, responsible algorithm development, diverse teams, quality coding, risk assessment, inclusive collaboration practices, transparency, training, evaluating bias, and much more.
Incorporating ethical review and responsible data science practices has been part of DataKind’s approach from the beginning. Still, DataKind is committed to continuous improvement, and we've expanded our data ethics practices over the last decade, as we’ve learned and grown.
Thanks to the important advocacy work of organizations like the Algorithmic Justice League, the need for using data responsibly and evaluating projects with an ethical lens has gained more attention across sectors and communities in the last five years. In the last two years, there’s been an incredible influx of tools and resources in the data science ethics space. In reviewing over 100 such resources as we’ve refined our ethical practices in the last year at DataKind, we’re grateful to continuously learn as the Data Science and AI for Good ethics space has matured.
Some resources for the social sector that have inspired our learning include: We All Count’s data equity workshops, NetHope’s AI Ethics for Nonprofits Toolkit, our own DataKind UK’s Practitioner’s Guide to Data Ethics Toolkits, Data Feminism by Catherine D'Ignazio and Lauren F. Klein, and classics like Datasheets for Datasets by Timnit Gebru, et al.
In our research and experience, we’ve recognized that there’s no single action that one can take, no single toolkit that’ll address every ethical issue you’ll encounter, and no solution that’ll work in all situations. Developing a comprehensive, ethical playbook isn’t DataKind’s intent. Instead, we strive to intentionally learn, utilize, and share tools to strengthen DataKind’s ethical practices and proactively adhere to our values. So that’s what we do.
In the DataKind Playbook, you’ll find our guiding principles for how we think about and incorporate ethics into our work. This blog includes many references to the DataKind Playbook, so make sure to create an account in order to view. These ethical principles were first created by DataKind UK in 2018 and have been adopted by the greater DataKind global community in the last year.
Doing Data Science and AI for Good well requires careful and inclusive reflection and evaluation at every stage of the project process. Ethics in data science is an ongoing process and requires more than a review at the beginning, middle, and end. A responsible data science project is completed through intentional thought by everyone and at every stage of the project. By embedding an ethical review process into each stage of the project, from discovering a partnership to evaluating project impact, we aim to ensure that DataKind makes a positive impact on humanity. Here’s a summary of our best practices and lessons learned in responsible data science, using the framework of our six-stage project process.
Many social sector organizations choose to partner with data science experts to take advantage of the benefits AI has to offer, but the decision to form a partnership should weigh criteria beyond the technical problem at hand. With ethics in mind, the Discover Stage of the project process at DataKind includes evaluating a potential partnership and goals of the project for alignment with our core values and mission. Before committing to designing and scoping a project, we critique the project idea by evaluating its ethical implications and assessing other existing solutions. The Discover Stage also offers the potential partner organization the opportunity to evaluate DataKind as a partner and collaboratively assess the ethical implications of the potential project. Using a critical lens to reflect on what might not work from the start, and what the risks are if everything does work, is essential in embedding responsible data science practices throughout our project process alongside our partner organization, before even touching the data.
An ethical review of the data and potential solution needs to be fully explored in the Design Stage, before we execute on any project work. This begins with checking data security requirements and evaluating the data provenance, before any data is shared. We then complete a data audit to evaluate project feasibility, data quality, inclusion or exclusion concerns, and bias before committing to a project. DataKind ensures accountability and high quality project design by involving subject matter experts in the design process and articulating our desired accountability to individuals and/or communities that could be impacted. While involving stakeholders in a participatory design process is essential, it’s also important to apply a responsible data science lens and understand what’s reasonable based on the existing data. Because of this, DataKind’s Design Stage includes a project risk assessment, creating mitigation strategies and identifying a clear pathway to success to ensure sustainability.
An important part of responsible data science is having a diverse team of experts that embody DataKind’s values, and this is what the Prepare Stage is all about. At DataKind, we know that building a diverse team with antiracist practices embedded in the recruitment and selection processes enables us to produce the highest quality projects possible. We value teams that are diverse in both identity (e.g. race, gender, ethnicity, nationality, sexual orientation, [dis]ability, socioeconomic status, age, religion, etc.) and professional background, skills, and experience. Part of preparing the team to complete an ethical AI project is also onboarding project volunteers to ensure collaborative, inclusive teamwork.
In the Execute Stage, we build responsible Data Science and AI for Good products using participatory methods: prototyping, ethical review, and high quality coding. Executing on a project is a key part of knowledge sharing at DataKind, in that the DataKind team and social sector partners collaborate closely to ensure those who will ultimately use the tools are confident in the decisions made throughout the development of the tool and in its implementation in the long-term. DataKind creates a prototype for full team review and feedback from end users and communities impacted before committing to a final product deliverable. Feedback on the prototype can help pivot a project to maximize its positive impact and minimize negative, unintended consequences. Throughout this stage, we evaluate bias and the ethics of end products by following through - and adjusting - the mitigation strategies outlined during the Design Stage. We also focus on the quality of coding because it enables the creation of quality and responsible products. For example, readable code enables greater transparency, more complete code reviews, and points at which to evaluate how technical decisions have been made with an ethical lens.
It’s easy to speed through sharing insights and findings about a project, but the Share Stage is one of the most important steps in responsible Data Science and AI for Good work. This stage enables us to ensure that: (1) the project is set up for successful implementation and potential scale; (2) others can learn from the team’s mistakes and successes; and (3) we live out our value of transparency and build trust when sharing results. We discuss lessons learned on ethics in DataKind projects at conferences and events, but we also want to continue to learn out loud by publishing our experiences on our blog. So we’re excited to do so in the coming months, highlighting projects in which ethical challenges arose and what we did about them.
The final stage in a responsible and high quality Data Science and AI for Good project is evaluating whether the solution is functioning as planned and addressing our partner’s needs. In consultation with our partners, we determine the right time frame for this review to take place after the solution is implemented. In the Evaluate Stage, we ask ourselves and our partners: How is the tool being used? Does its utility raise additional ethical concerns? If so, what actions would mitigate the concerns and risks? Sometimes, the answer might be that further improvements to the solution are needed. Alternatively, this review might cause the team to decide to discontinue the use of the product, and that's okay! It’s essential to take the time to reflect again on the project outputs and outcomes with an ethical lens.
The above practices are essential elements of our project process, but we’re continuously improving our processes and seeking out insights into what more we can do. Recently, we’ve been unpacking how to more directly incorporate antiracism specifically into our organizational practices and project work. Over the last two years, DataKind has committed to becoming an anti-racist organization, which we see as an essential pairing to our commitment to data ethics. This includes every team incorporating anti-racist practices into at least two business processes, which has resulted in updates to some of our responsible data science practices, with a commitment to continued evolution and growth.
At DataKind, two of our core values are humility and transparency. We know that there are still ways that we can learn and improve, and we’re committed to sharing openly when we make mistakes. With that, we kindly ask that you help keep us accountable! If you’re ever working with DataKind and have a question or concern about ethics, we want to hear it. You can email us at firstname.lastname@example.org.
To our global community of volunteers, project partners, donors, and more, thanks for being part of making sure we do Data Science and AI for Good well!
As the leader of the Center of Excellence, Rachel codifies processes, ensures all projects are executed with the highest quality standards, and creates structures to facilitate experimentation and sharing of learnings at DataKind.
Citation: Floridi L, Taddeo M. 2016 What is data ethics? Phil. Trans. R. Soc. A. 374: 20160360. http://dx.doi.org/10.1098/rsta.2016.0360
Header image courtesy of iStock/nadia_bormotova.
As always, thank you for your support of this critical work!