How do you become a data scientist?
I think that really depends on where you are now and what you really want to do as a data scientist.
Nevertheless, DataCamp posted an infographic recently that described 8 easy steps to becoming a data scientist. In this post I want to highlight and review DataCamp’s infographic.
What is a Data Scientist
Before defining the steps to becoming a data scientist, the graphic defines what a data scientist is using three key resources:
- Drew Conway’s data science venn diagram that combines hacking skills, math and statistics knowledge and substantive expertise.
- A graph showing the survey results on the question of education level, not unlike the graph in O’Reilly’s Analyzing the Analyzers.
- Josh Wills’ quote on what is a data scientist.
Become a Data Scientist
From the infographic, the 8 steps to becoming an data scientists are:
- Get good at stats, math and machine learning. Take online courses.
- Learn to code. Computer science, development and a language.
- Understand databases. Data types, technologies store them, and methods to retrieve data.
- Master data munging, visualization and reporting. Tools.
- Level up with big data. Bigger tools like hadoop, mapreduce and spark.
- Get experience, practice and meet fellow data scientists. Competitions, pet project and developing an intuition.
- Internship, bootcamp or get a job
- Follow and engage with the community
At first glance, the graphic suggests that standard mantra of become a math and programming genius before even looking at data or algorithms, an approach I think is wrong.
At closer examination, the graphic is suggesting a path of familiarization from steps 1-5. It suggests to take courses and get up to speed with the language of data science and data.
Then steps 6-7 are about actually working problems and developing skills before topping out and following the community in step 8.
From this more nuanced perspective, it’s a great graphic, I like it.
I would go further.
I would suggest that steps 1-5 be minimized further to one step that provided a crash course of terms and themes across these areas. I would suggest getting to the point of working a data set using a tool as soon as possible. Working through this process and working problem after problem will highlight the need and provide the context for those foundational topics that can be learned and weaved in just-in-time.
A segmented linear decomposition is great for course design and infographics, but not best for learning and getting results. I think the modules or steps should be integrated.
Studying computer science can make you a good computer scientist (for whomever needs whatever that is) and a more rounded engineer, but to be a great programmer, you need to practice programming.
I think the same applies to working data problems. To get great at working problems end-to-end, you need to focus on and practice this process and learn relevant theory in the context of this process. It will act like a great knife, cutting scope to what is required and relevant, rather than all that happens to be in the courses and textbooks.
How do you become a data scientist, work data problems. A lot.