Phil Brierley won the Heritage Health Prize Kaggle machine learning competition. Phil was trained as a mechanical engineer and has a background in data mining with his company Tiberius Data Mining. He is heavily into R these days and keeps a blog at Another Data Mining Blog. In October 2013 he presented to the Melbourne Users of R special interest […]
Archive | Machine Learning Process
Classification Accuracy is Not Enough: More Performance Measures You Can Use
When you build a model for a classification problem you almost always want to look at the accuracy of that model as the number of correct predictions from all predictions made. This is the classification accuracy. In a previous post, we have looked at evaluating the robustness of a model for making predictions on unseen […]
A Simple Intuition for Overfitting, or Why Testing on Training Data is a Bad Idea
When you first start out with machine learning you load a dataset and try models. You might think to yourself, why can’t I just build a model with all of the data and evaluate it on the same dataset? It seems reasonable. More data to train the model is better, right? Evaluating the model and […]
How To Choose The Right Test Options When Evaluating Machine Learning Algorithms
The test options you use when evaluating machine learning algorithms can mean the difference between over-learning, a mediocre result and a usable state-of-the-art result that you can confidently shout from the roof tops (you really do feel like doing that sometimes). In this post you will discover the standard test options you can use in […]
Applied Machine Learning Process
The Systematic Process For Working Through Predictive Modeling Problems That Delivers Above Average Results Over time, working on applied machine learning problems you develop a pattern or process for quickly getting to good robust results. Once developed, you can use this process again and again on project after project. The more robust and developed your […]
Why you should be Spot-Checking Algorithms on your Machine Learning Problems
Spot-checking algorithms is about getting a quick assessment of a bunch of different algorithms on your machine learning problem so that you know what algorithms to focus on and what to discard. In this post you will discover the 3 benefits of spot-checking algorithms, 5 tips for spot-checking on your next problem and the top […]
Reproducible Machine Learning Results By Default
It is good practice to have reproducible outcomes in software projects. It might even be standard practice by now, I hope it is. You can take any developer off the street and they should be able to follow your process to check out the code base from revision control and make a build of the […]
What is Data Mining and KDD
I am very interested in processes. I want to know good ways to do things, even the best way to do things if possible. Even if you don’t have skill or deep understanding, process can get you a long way. It can lead the way and skill and deep understanding can follow. At least, I […]
How to Use Machine Learning Results
Once you have found and tuned a viable model of your problem it is time to make use of that model. You may need to revisit your why and remind yourself what form you need a solution for the problem you are solving. The problem is not addressed until you do something with the results. […]
How to Improve Machine Learning Results
Having one or two algorithms that perform reasonably well on a problem is a good start, but sometimes you may be incentivised to get the best result you can given the time and resources you have available. In this post, you will review methods you can use to squeeze out extra performance and improve the […]