Tell us about what you do professionally.
I am a research engineer at Cloudera Fast Forward Labs where I help identify technological breakthroughs within the field of computer science, data science, and machine learning that will find applications within the next couple years in industry. Or, as I like to say, I like to think about SciFi and keep it real. I read, write code, and build prototypes to demonstrate the technical capabilities “of tomorrow”.
Currently, I am working on multitask learning for text classification with “soft parameter sharing”, that is, an algorithm that can identify the relatedness of tasks and make use of this relatedness to perform at a higher accuracy.
I also work with external partners and clients to develop a coherent data strategy for companies looking to put their data to good use, and I advise their technical teams as they move towards implementation. I write the occasional blog post and a newsletter on news and updates in data science and machine learning. It’s (a little) all over the place, and (a lot of) fun.
How did you hear about DataKind?
I read about DataKind in The Economist four years ago when I was a postdoctoral researcher at New York University. I studied human decision making under risk and uncertainty (in less formal terms, “gambling”) and I enjoyed my research, but I was looking to have a more direct impact on the world. I had just decided to try out data science when I read about DataKind. I was excited. “Doing good with data” was exactly what I had been looking for, and volunteering was a perfect way to try out a new profession without making a full commitment yet.
Tell us about your work with DataKind.
I stack chairs. Well, I used to stack chairs.
When I was looking to become a DataKind volunteer, I had no data science experience. I had coding, data analysis, and data modeling experience, but, until then, the data I had worked with was small and clean, the result of carefully designed experiments. Also, I had never worked for a nonprofit or business (other than NYU).
I knew DataKind had more volunteers than projects. I figured I would have to get to know them, and they would have to get to know me, to be considered for a volunteer position. So, I stacked chairs at their meetups (and, admittedly, on a low postdoc salary, I enjoyed the free food).
Eventually, DataKind offered me the chance to join a DataCorps project alongside a team of more experienced individuals. We worked with DonorsChoose, a nonprofit that provides a crowdsourcing platform for teachers. I analyzed what books teachers request funding for to understand themes and topics teachers are interested in. “Billy the Bully Goat,” was one of the most frequently requested books; telling, isn’t it?
Fast forward four years, I am still grateful for the opportunity I was given. The people I worked with on the DonorsChoose project taught me much about data science; they helped me get started. And, I stayed involved with DataKind.
I was a Data Ambassador for the March 2017 DataDive in NYC and worked, alongside my partner ambassador Susan Sun, on analyzing non-profit tax forms (also known as the 990 data, after the name of the tax form). The 990 data allow insight into the revenue and expenditure of nonprofits as well as their mission and programming, a fascinating, to date underutilized data set.
In August 2017, I was given the opportunity to support my collaborator Rita Ko (Director of The Hive, a digital innovation team that is part of USA for UNHCR) at a DataKind DataDive in Seattle. The Hive uses data, data science, and machine learning in support of refugees. We worked with donations data to use data science to help fundraising efforts and we got started on an exciting project using satellite imagery data to track the development of refugee camps across the globe.
Most recently, I helped out during the November 2017 DataDive back in NYC on democratic freedom. Alongside data ambassadors Susan Sun and Ismael Jaime Cruz, and a team of DataDive volunteers, I worked with the Southern Poverty Law Center on identifying hate speech on websites, on social media platforms, and in search result.
I enjoy using my skills to help contribute to something more important than myself, my career, my life, and I receive much in return. The DataKind community is kind and welcoming; it is no coincidence that many of the people I’ve met over the years, working with and for DataKind, have become my friends.
What is one of the most surprising things you’ve learned or seen in working with data?
One analysis of the 990 nonprofit tax data by DataDive volunteer Rachel Wagner-Kaiser, guided by subject matter expert Jiehae Choi-Blackman, showed that organizations in areas such as international development, where the donor is not also the beneficiary, rely and continue to rely on grants and charitable giving while organizations in areas such as health care, where the donor may also be or become the beneficiary, find more stable sources of revenue over time. A clever analysis, the fruit of collaboration, that can guide the fundraising efforts of nonprofits; deep insight without deep learning (the use of a shiny new tool is, often, not related to impact).
What’s the most interesting data project you’ve seen recently?
I enjoy generative models for text, like @dictionarish, a neural network that generates dictionary entries, or Benjamin, the neural network that wrote the script for the movie Sunspring. Trained on SciFi movie scripts, the generated script, and movie, is strangely moving and tells us perhaps quite as much about today’s generative models, and their quality, as it does about the amazing human capacity to make sense of human language.
Closer to Earth, the Pew Research Center published a report on the analysis of comments submitted to the FCC about net neutrality. 57% of comments seem to include false or misleading personal information, there is evidence of organized campaigns to flood the comments with repeated messages, and on nine occasions, 75,000 comments were submitted at exactly the same moment, often with very similar content. Democracy relies on our ability to partake in debate, the Pew report highlights some of the challenges we face in today’s society.
What blogs or articles do you love reading to stay up to date on all the data news?
To stay up to date on developments in data science and machine learning, I read Jack Clark's Import AI newsletter, Denny Britz Wild Week in AI, and the entertainingly snarky newsletter from the NYU Center for Data Science.
To understand the business side of data, the backdrop to our work, I read Ben Thompson’s Stratechery, excellent daily updates on all things tech from a business, political, and societal perspective (it is well worth the $10 a month). The Information and the Axios’ newsletters round out my news diet. Generally, I am excited about the return of news subscriptions, especially when it allows content creators to serve high quality content about issues I deeply care about.
Finally, The Guardian has had excellent pieces on “content moderators”, a 21st century job created by new technology, and I like to follow the thoughtful work of Data & Society on how technology is affecting and changing the world around us.
When you’re not busy using data science to change the world, what do you like to do in your free time?
I like to read. Exit West, for example, a coming of age story during times of war, depicts the struggle of searching for a new home when yours has been destroyed. Fiction reminds us of the need to understand individuals, not just trends. Patterns we recognize in data are incredibly useful, but as people, we all are unique. We need not forget to recognize and acknowledge individuality and humanity when we work with numbers that are, as they are so often, about people.
Also, I enjoy lying on my couch.
Finally, I recently mastered the blind hem. Learning to sew, it’s a fun hobby (I find it quite difficult, too).