What is the difference between a Data Analyst and a Data Scientist. This question is considered from the perspective of researchers and academics in the third instalment in the series of The Data Analytics Handbook.
The first book contained 7 interviews with working analysts and data scientists. The second book contained 9 interviews with CEOs and managers. This third book in the series contains 8 interviews with academics and researchers and is called The Data Analytics Handbook: Researchers and Academics.
I note that the authors are using these free ebooks are lead generator for their start-up called Leada.
Top 5 Findings
- There are wrong questions to ask about the data (asking questions to which the data does not have answers)
- Data science is a strategic initiative
- Data professionals must be humble (humble to the data, skeptical of results, data is the main source)
- Analytics is a basis for competition (basis of competition in business)
- For data science, learn how to learn (education is a continuing process)
The handbook provides 8 interviews with academics and researchers from 8 institutions.
- Michael Chui from Mckinsey
- Prasanna Tambe from NYU Stern School of Business
- Hal Varian from Google
- Jimmy Retzlaff from UC Berkeley
- David Smith from Revolution Analytics
- Gregory Piatetsky from KDD Nuggets
- Tim Piatenko from Comr.se
- Tom Davenport from Babson College
I enjoyed Hal Varian’s interview who true to his form commented on the importance of teasing out causality from data being critical for decision making. He said something about the next steps for data systems, specifically:
We’re going to see a lot more “self optimizing” or “learning systems” that run experiments and improve their performance without any human intervention.
I agree with this insightful comment and have even built mini-versions of such systems. I think any programmer who spends time in this world is likely to do the same and as such we are going to start seeing this capability built on top of APIs and modeling libraries.
I also really liked Hal’s comments on Type III error (as opposed to Type I and Type II errors from statistics). As summarized in the top-5 findings, he sees a large problem in wasting time asking the wrong questions, i.e. trying to answer questions that the data cannot inform.
Another great interview is that with David Smith who comments on the critical importance of knowing R in data science, calling it the lingua franca of data science, like english is for business. David commented that statistics used to be a back office job and now with the big data movement has moved to a front office job as businesses look for return on their investment in collecting all of their data.
Finally, I felt affinity with Gregory Piatetsky who commented in his interview on the transitions the field through Data Mining to Knowledge Discovery to Predictive Analytics to Big Data and now to Data Science. Although I have on;y been reading in the field for 15 years (as opposed to Gregory’s 30+), I have seen the same shifts in focus and naming. This again stresses the importance on the fundamentals of statistics and framing your problem.
You can grab your free copy of this handbook of interviews here.