The most commonly reported measure of classifier performance is accuracy: the percent of correct classifications obtained. This metric has the advantage of being easy to understand and makes comparison of the performance of different classifiers trivial, but it ignores many of the factors which should be taken into account when honestly assessing the performance of […]
Evaluate Yourself As a Data Scientist
What skills do you need to be a data scientist? I read an interesting data-driven approach to answering this question in the book Doing Data Science: Straight Talk from the Frontline. In this post I summarize this self-assessment approach that you can use to evaluate your strengths as a data scientist and where you might […]
Understand Your Problem and Get Better Results Using Exploratory Data Analysis
You often jump from problem-to-problem in applied machine learning and you need to get up to speed on a new dataset, fast. A classical and under-utilised approach that you can use to quickly build a relationship with a new data problem is Exploratory Data Analysis. In this post you will discover Exploratory Data Analysis (EDA), […]
Data Management Matters And Why You Need To Take It Seriously
We live in a world drowning in data. Internet tracking, stock market movement, genome sequencing technologies and their ilk all produce enormous amounts of data. Most of this data is someone else’s responsibility, generated by someone else, stored in someone else’s database, which is maintained and made available by… you guessed it… someone else. But. […]
How to Become a Data Scientist
How do you become a data scientist? I think that really depends on where you are now and what you really want to do as a data scientist. Nevertheless, DataCamp posted an infographic recently that described 8 easy steps to becoming a data scientist. In this post I want to highlight and review DataCamp’s infographic. […]
Crash Course in Statistics for Machine Learning
You do not need to know statistics before you can start learning and applying machine learning. You can start today. Nevertheless, knowing some statistics can be very helpful to understand the language used in machine learning. Knowing some statistics will eventually be required when you want to start making strong claims about your results. In […]
Why Aren’t My Results As Good As I Thought? You’re Probably Overfitting
We all know the satisfaction of running an analysis and seeing the results come back the way we want them to: 80% accuracy; 85%; 90%? The temptation is strong just to turn to the Results section of the report we’re writing, and put the numbers in. But wait: as always, it’s not that straightforward. Succumbing […]
Machine Learning Q&A: Concept Drift, Better Results and Learning Faster
I get a lot of questions about machine learning via email and I love answering them. I get to see what real people are doing and help to make a difference. (Do you have a question about machine learning? Contact me). In this post I highlight a few of the interesting questions I have received […]
Hello World of Applied Machine Learning
It is easy to feel overwhelmed with the large numbers of machine learning algorithms. There are so many to choose from, it is hard to know where to start and what to try. The choice can be paralyzing. You need to get over this fear and start. There is no magic book or course that […]
How To Get Baseline Results And Why They Matter
In my courses and guides, I teach the preparation of a baseline result before diving into spot checking algorithms. A student of mine recently asked: If a baseline is not calculated for a problem, will it make the results of other algorithms questionable? He went on to ask: If other algorithms do not give better accuracy […]