Meet Matt Harris! He’s a superstar who's held many volunteer roles here at DataKind.
More recently, he supported Jacaranda Health, one of our long-term partners based in Kenya under our Frontline Health Systems Impact Practice. In this project, he worked on an AI driven software application that allows help desk agents to quickly triage medically urgent messages and make correspondence with users more timely and efficient, helping new mothers and mothers-to-be have access to critical advice and support. In addition to this project, Matt’s been involved in multiple initiatives, for example analyzing official eHealth policies from 39 African countries, developing a web app exploring health text using Natural Language Processing (NLP) approaches, serving as an active member in the Scoping Squad, and more.
Matt’s day job is with a FinTech company in New York where he leads the data science and application development team. He loves to roll up his sleeves and explore datasets to see if he can find new insights and solutions to problems that help people. Check out more about Matt and his story below!
Tell us a little bit about yourself and your background.
My background is in Astrophysics, studying atmospheres of Earth and other planets, so my interest in data goes back a long way and far out into the galaxy. I currently lead data science and application development teams for a FinTech company in New York, where we try to find ways to save time and money so that people have more tools and space to do great things. I’m also lucky enough to be a volunteer for DataKind which brings my worlds together - data and tackling social issues to try and make our planet a better place for everybody. When I'm not nerding out, I can be found stomping around the hills of Vermont, looking after foster doggies and playing hideously bad guitar.
Can you briefly describe the project that you’re currently working on?
I'm currently working with Jacaranda Health looking at ways to help pregnant mothers get the help they need. Jacaranda Health already has great software in place that automatically helps to understand the questions thousands of mothers in Kenya have about their pregnancies, medical advice, nutrition, and more while simultaneously triaging their questions to flag potential emergencies. In this project, I’m looking at the ways in which the technology might be extended to automatically extract entities from interactions between mothers and the AI-driven software application. More specifically, I’m looking at the different types of foods mentioned in mothers’ questions about nutrition and cases where Jacaranda needs to identify the names of medical facilities for reporting and monitoring. It's an interesting challenge because chats can be in English, Swahili, or the Sheng dialect spoken by many in Kenya.
What surprised you most about the project?
A lot! But I did observe that there are quite a few questions about avocados. Kenya is in the top 10 avocado producers of the world.
What is the highlight of the project so far?
It’s been inspiring to see the fantastic support triage automation that Jacaranda Health has developed.
What were the most challenging moments from your project (and how you resolved them)?
The project originally aimed to investigate how AI driven model performance might be improved. Coincidentally, the model classes (i.e., chatbot intents) were in the process of being updated and a new training set developed. The new model is now running in production, but there isn't yet enough telemetry captured to analyze how it might be improved. Because of this we pivoted to focus on entity extraction.
What data science skills have been most useful for this project?
To analyze the chats I needed to use a number of Natural Language Processing (NLP) applications, such as LDA topic analysis, SpaCy POS tagging, and evaluation of chat classification. For exploring model performance I tried several approaches, such as fastText and BERT. Generating named entity recognition (NER) training data is done by using Fuzzy Matching and clustering. Lastly, the integration with Jacaranda Health’s processes required me to learn a bit about Google Auto ML, as well as using Azure cognitive services LUIS to prototype workflows for ongoing maintenance and retraining of models.
What professional skills (non-data science) have been most useful for this project?
I’ve been able to apply some of the concepts I use in my day job, for example, process design and triage automation. It's been great to exchange some ideas on that, for example in how best to generate accurate training data as part of issues being progressed through the Jacaranda support team.
What tips do you have about communicating data science findings to nonprofits most effectively?
Data science can get very technical and an obvious insight to a data scientist can sometimes look like gobbledygook to non-technical, social sector experts, so it's important to always translate clearly into real-world context and goals in the simplest way possible. Stating the obvious, I know, and not always easy! But this is such a key step towards the “stickiness” of any technical solution proposed to a human being, no matter how amazing that solution may be. It has to be distilled into clear understandable terms.
What's your experience been collaborating with other volunteers on the project and/or working with the DataKind Global staff?
Always great, a nice group of people. Good positive energy, caring and engaged folks, I'm always learning on these projects.
What was the most interesting observation you made about challenges that nonprofits face?
An obvious challenge is the lack of resources and skills a nonprofit might have around data science. That said, I also think there’s a more subtle challenge - there’s so much hype around data science and AI being able to magically solve problems that I think expectations can be a little high sometimes. I think clear engagement with problem and goal definitions are key, as well as precursor analysis to answer the initial question - do we have sufficient data to actually do this?
What advice would you like to share with volunteers who are new to DataKind or the Data for Good movement?
Listen closely to your teammates, you’ll always learn something new.
How did this project introduce you to new connections/friends?
I met Benjamin Kinsella, DataKind’s technical project manager as well as Mitali Ayyangar, DataKind’s portfolio manager. Both are enthusiastic, positive and super nice - I’ve learned a lot from them and really enjoy working with them and others at DataKind.
What did you discover about yourself while working on the project?
That I'm interested in eHealth policy!
What does “living with a sense of purpose” mean to you?
To try and do things that help improve the world, no matter how small.
Who inspires you?
Former President Barack Obama, a class act, Isaac Newton (even though he was a bit grumpy!), and Freddie King.
If you could be any animal, which would you be?
One of those cool octopuses that can change color to match the background.
What’s your favorite movie?
What’s the best concert you've ever attended?
AC/DC at the Giants Stadium.
What’s the last book you read?
The Warmth of Other Suns: The Epic Story of America's Great Migration by Isabel Wilkerson.
If you wrote a letter of gratitude to your future self, what would it include?
Very little, so that I wouldn't introduce any time travel paradoxes.
What’s one piece of advice you’d give to your younger self?
Learn to hold the guitar pick correctly!
About Volunteer Spotlights
Our volunteers are the lifeblood of our mission. They’ve inspired people to use their skills in ways they never dreamed of. They’ve slayed misconceptions. They’ve shown organizations trying to make the world a more humane place how data science and AI can change the game. We’re honored (and thrilled) to feature their stories in DataKind’s Volunteer Spotlight series. Follow this series to learn about their impeccable skill sets, their work with our brilliant project partners, and what inspires them to give their time, resources, and energy to causes that matter.