Last Updated on June 7, 2016
How do you get good at Kaggle competitions?
It is a common question I get asked. The best advice for getting started and getting good is to consistently participate in competitions. You cannot help but get better at machine learning.
A recent post by Triskelion titled “Reflecting back on one year of Kaggle contests” bares this out. He started out as a machine learning beginner and finished up as a “master” level Kaggle competitor (achieving a 10% and a top 10 finish)
In this post we will review Triskelion’s lesson of consistent participation as an approach to begin an master Kaggle.
Key To a Good Start
I think the key to Triskelion starting well and having the confidence to continue is two fold:
- Reproduced Results: He reproduced results described in the forums and blog posts.
- Used Tools: Through reproducing results, he discovered and started to use tools like Vowpal Wabbit and scikit-learn.
This is an obvious but an extremely underrated approach.
There is a lack of good machine learning tutorials. The best surrogate (and better than tutorials on toy datasets) are the “how to beat the benchmark” posts on forums and the “how I did it posts” at the end of a competition.
The reason for this is that these quasi-tutorials give you insight into how a world-class analyst thinks about and solves a problem. For example: the tools they use, how they set-up their pipeline, the parameters they use, the process, everything.
Mimicking these elements is a clever way to bootstrap your machine learning skills.
Use Good Tools
A beginner mistake is reimplementing algorithms from scratch.
There is vast array of powerful tools available and you must take advantage of them. You will get better results, faster. This will motivate you to push further.
Triskelion discovered Vowpal Wabbit early on and was not afraid to play with it. VW is a very powerful tool that even professionals have a hard time with.
In fact, a problem I see in “experts” trained in machine learning is that the ignore or even scoff at modern or different tools. They learned machine learning in R or Weka and therefore every problem can only be addressed with their weapon of choice.
The more tools you know and can use, the more ways you have to think about and tackle your problem.
Key To Getting Good
Competing consistently is the key to getting good.
Good is relative, but Triskelion is demonstrably much better now than one year ago (better than nearly 200,000 other competitors), due largely to his aggressive participation schedule.
He lists off 7 specific competitions, but his profile indicates a total of 15 competitions in which he has participated.
If you want to get good at machine learning competitions, follow his lead and participate in a lot of competitions. Even if you just meet the benchmark in the first few, you will learn a lot about data preparation and tools.
If you reproduce the results you see posted on blogs and forums for those competitions, then the gains will be non-linear.
Finally, Triskelion finishes with a number of tips.
- Practice a lot: Do as many challenges as you can, incremental improvements.
- Study evaluation metrics: Really understand AUC, etc. (see a list of metrics)
- Study the domain: Business cases, papers, state of the art, feature engineering
- Team up: Top 10 finish is hard, but he need to team up to achieve it.
- Read the forums: Post to competition threads, understand winning solutions.
- Share on forums: Lots of angles on a given problem, don’t share too much.
- Use ensembles: They always improves results, can give you a top 10 with simple models.
- Experiment: Try out ideas rather than living in thought
- Creativity: Think outside of the box
- Tools: Find and use good algorithms.
- Tuning: Use cross-validation, tune all model parameters.
His final tip is to have fun.
This might very well be the most important point. Competitive machine learning is amazingly fun. Find the fun in it. Some perseverance is needed to get over the knowledge hump when starting out. The very act of doing “OK” (beating the benchmark) might be that fun part in the beginning.