This week, our founder and executive director Jake Porway got to do his first ever Reddit AMA on the data science subreddit. While there were tons of great questions ranging from ethics in data science to how DataKind measures impact and more - it was clear a common theme centered around how to get started in a career in data science. Jake wrote the reply below to help folks start their journeys towards being a data scientist. If you too are wondering how to get started, ready through below and feel free to post any resources you've found useful in the comments! Read through all the questions in the full AMA thread here.
Hey everyone! I’m seeing many questions from budding or new data scientists in the thread trying to figure out the best path ahead - How do I get started in a career in data science? What skills do I need? What should I major in?
As we all know, data science is becoming increasingly popular, yet the term is still hotly debated.
So to start us off, my view is that a data scientist is basically a statistician who can program. Data science is the art of using the latest computer science and statistical techniques to collect, analyze, visualize, and otherwise draw conclusions from data. Most of the thorny topics being discussed these days about bias, quality of data, modeling, learning, and data cleaning all come from the healthy body of statistics we've built over the last 100 years.
The novelty of data science comes from a technical need to be able to handle the volume of data now available and to wrangle it from many disparate forms into a clean, usable format. Beyond that, all the other skills attributed to a data scientist - visual communication skills, good written skills, subject matter expertise - hold true for anyone doing science, from biology to anthropology.
What’s interesting to note is that the skills needed by a true “data scientist” are exceedingly rare. Using Drew Conway’s data science Venn diagram (yes, I still reference this one), one needs to have:
The good news? With this diversity of skills needed, there are lots of pathways you can follow and no one way. For example, Drew himself was a computer science undergrad who went on to get a Ph.D. in political science. His graduate work drew him into the world of statistics and data, including machine learning concepts that inspired him to study the social networks of terrorists and do predictive analytics on voting behaviors. He also has great communication skills, picked up some basic visualization, and has strong business and management savvy from his time in government and intelligence.
Other people I know have come from mathematics backgrounds and then picked up programming and computer science to be able to build more advanced models. No matter how you get there, you’ll need to build up your programming and stats skills and not lose sight of your soft skills of communication and creativity.
To learn more about the paths of data scientists, I also recommend a great book Sebastian Gutierrez, one of the moderators of /r/datascience put together called Data Scientists at Work.
The bad news? There are lots of pathways you can follow so it can feel overwhelming to figure out how to get started.
The Internet is now littered with online courses to teach you data science. Check out Coursera first and foremost. There are also fellowships through Insight and the Data Incubator that will round out your data science training over about 12 weeks. I’m also a huge fan of John Foreman’s Data Smart for a good intro to data science algorithms and thinking if you’re more of the self-learning type. Of course the best way to learn is to do: Check out online competitions through Kaggle or DrivenData to take part in machine learning competitions. Start small and look at questions you’re genuinely interested in.
Lastly, don’t underestimate the power of meeting people in person. Immerse yourself in the data science community as best you can. Attend local Meetups, check out webinars or local conferences, and keep posting questions on /r/datascience of course and you’ll soon be well on your own data science path. When you’re ready to start the job search, don’t forget that we do a monthly jobs round up over at DataKind to help you use your powers for good - check out our list for January!
No matter how you get there, enjoy the journey. Data science is a thrilling and exciting field and whether you know Linear Algebra backwards and forwards or not is not as important as rolling up your sleeves and having fun digging in wherever you’re at. Good luck!