In a recent presentation at MLConf, Xavier Amatriain described 10 lessons that he has learned about building machine learning systems as the Research/Engineering Manager at Netflix. In this you will discover these 10 lessons in a summary from his talk and slides. 10 Lessons Learned The 10 lessons that Xavier presents can be summarized as […]
Get Your Dream Job in Machine Learning by Delivering Results
You can rise up and take on your desire to become an a machine learning practitioner and data scientist. You have to work hard, learn the skills and demonstrate that you can deliver results, but you don’t need a fancy degree or a fancy background. In this post I want to demonstrate that this is […]
Assessing and Comparing Classifier Performance with ROC Curves
The most commonly reported measure of classifier performance is accuracy: the percent of correct classifications obtained. This metric has the advantage of being easy to understand and makes comparison of the performance of different classifiers trivial, but it ignores many of the factors which should be taken into account when honestly assessing the performance of […]
Evaluate Yourself As a Data Scientist
What skills do you need to be a data scientist? I read an interesting data-driven approach to answering this question in the book Doing Data Science: Straight Talk from the Frontline. In this post I summarize this self-assessment approach that you can use to evaluate your strengths as a data scientist and where you might […]
Understand Your Problem and Get Better Results Using Exploratory Data Analysis
You often jump from problem-to-problem in applied machine learning and you need to get up to speed on a new dataset, fast. A classical and under-utilised approach that you can use to quickly build a relationship with a new data problem is Exploratory Data Analysis. In this post you will discover Exploratory Data Analysis (EDA), […]
Data Management Matters And Why You Need To Take It Seriously
We live in a world drowning in data. Internet tracking, stock market movement, genome sequencing technologies and their ilk all produce enormous amounts of data. Most of this data is someone else’s responsibility, generated by someone else, stored in someone else’s database, which is maintained and made available by… you guessed it… someone else. But. […]
How to Become a Data Scientist
How do you become a data scientist? I think that really depends on where you are now and what you really want to do as a data scientist. Nevertheless, DataCamp posted an infographic recently that described 8 easy steps to becoming a data scientist. In this post I want to highlight and review DataCamp’s infographic. […]
Crash Course in Statistics for Machine Learning
You do not need to know statistics before you can start learning and applying machine learning. You can start today. Nevertheless, knowing some statistics can be very helpful to understand the language used in machine learning. Knowing some statistics will eventually be required when you want to start making strong claims about your results. In […]
Why Aren’t My Results As Good As I Thought? You’re Probably Overfitting
We all know the satisfaction of running an analysis and seeing the results come back the way we want them to: 80% accuracy; 85%; 90%? The temptation is strong just to turn to the Results section of the report we’re writing, and put the numbers in. But wait: as always, it’s not that straightforward. Succumbing […]
Machine Learning Q&A: Concept Drift, Better Results and Learning Faster
I get a lot of questions about machine learning via email and I love answering them. I get to see what real people are doing and help to make a difference. (Do you have a question about machine learning? Contact me). In this post I highlight a few of the interesting questions I have received […]