Last Updated on June 7, 2016
Jeremy Howard, formally of Kaggle gave a presentation at the University of San Francisco in mid 2013. In that presentation he touched on some of the broader benefits of machine learning competitions like those held on Kaggle.
In this post you will discover 5 points I extracted from this talk that will motivate you to want to start participating in machine learning competitions
Big Data at USF
The talk presented by Howard was titled “Jeremy Howard of Kaggle speaks about Big Data at the University of San Francisco“. The title is a misnomer. The talk focused on Howard’s background, how he came to machine learning and touching briefly on kaggle.
Howard has a background in start-ups and this talk gives a good summary of that background and the lessons he has to pass on from that journey.
Toward the end of the talk Howard touches on Kaggle and their mission which is what inspired these 5 points. They are:
- Meritocracy: Status is baed solely on ability.
- Role Models: Best performers and their origin stories become role models.
- Push Limits: Leaderboard push the capabilities of you and the group.
- Innovation: Competitions result in technological innovation.
- Communities: Like minds find each other and share ideas.
Data Science or Machine Learning competitions are a meritocracy. This means that rank is determined solely based on merit.
The analogy given is that of sports where the only thing that matters is the result achieved by the athlete. It does not matter where you come from, your gender or where you went to school. All that matters is what results you can achieve.
Such systems are fair, the biases that exist like those in the workplace do not influence the result. The system is also transparent, everyone has access to the same source material (training data) and the evaluation of performance (leaderboard).
We have touched on this in a previous post Applied Machine Learning is a Meritocracy.
2. Role Models
Competitions create role models.
The results of the competitions show that it is generally not the academics that do well, but those people with an adaptive engineering mindset that use what works to get the best result. People with diverse and interesting backgrounds are ranking in the top 10 or top 100 of all data scientists on the platform.
This has the effect of creating role models. Their stories are different, such as only having encountered machine learning one year earlier in the free Coursera course. These interesting stories draw you in, “if he can do it, I can do it“.
You also see that when a “known” data scientist joins a competition, like a star from the Netflix Prize, then this prompts a lot more attention, “I want to beat the person who did well in the Nextfix Prize“.
3. Push Limits
Like sports, a leaderboard can push the limits of what you and the group are capable of.
Just by knowing that one person knows something that you do, even after you have given it your all, can push you to search for that one piece of additional information.
The real-time feedback of the leaderboard has a psychological effect on the results that can be achieved. This may cut both ways as it did with the four minute mile until Roger Bannister broke it, proving that it could be done.
Competitions result in technological innovation.
The state-of-art benchmarks are broken every time. This most likely occurs because the problems are well specified for machine learning and because the participants are not limited to the methods used in a given domain or field of study. Anything goes.
This opens up different ways of talking problems which can both be leveraged in the field and leveraged on future similar competitions, accelerating advancements across the board.
Communities spring up around competitions.
There is a balance in sharing information but not sharing too much that you lose ground in the competition. Sharing benefits you and the group and seems to happen automatically around each competition.
Like minds find each other and team up, exploiting the best parts of each others ideas and pushing beyond what they are capable of independently.
Community and information flow is a crucial ingredient in good competitions. They help beginners get started, intermediates advance and innovation occur.
In this post you have discovered five benefits of competitive machine learning. They were: meritocracy, role models, pushing the limits, innovation and communities.
This is not new in machine learning, competitions have existed in collaboration with academic conferences for nearly 20 years. What is new is the scale of participation and the low barrier of entry. It’s an exciting and opportunistic time to get into applied machine learning, regardless of your background.