What should I do if I want to get ‘better’ at machine learning, but I don’t know what I want to learn?
In this post you will discover a summary of Colorado recommendations and a breakdown of his roadmap.
Strategy To Do Better at Machine Learning
Colorado is a PhD Student at Berkeley and founder of Metacademy. Metacademy is an open source platform where experts collaborate to construct wiki articles. At the moment, the articles focus on machine learning and artificial intelligence. It’s a great site.
Colorado’s suggestion for getting better at machine learning is to consistently work your way through textbooks. He comments that the process of reading through a textbook is the process of becoming one with the textbook.
This strategy is unsurprising coming from a PhD candidate in the thick of it, may have recommended the same thing back in the day. It is OK advice, however, I don’t think it is the right advice for everyone. If you are a programmer that lets ideas seep in by implementing them, then the list of textbooks might be useful references for when you are looking to take that next step on a given method.
Machine Learning Roadmap
His roadmap into machine learning is in turn broken down into 5 levels, each pointing to a specific textbook to master. The five levels are:
- Level 0 (Neophyte): Read Data Smart: Using Data Science to Transform Information into Insight. Assumes you know your way around excel and will finish up knowing about the existence and maybe the high-level data flow of a few algorithms.
- Level 1 (Apprentice): Read Machine Learning with R. Learn when to apply different machine learning algorithms, and use R to do so. Assumes maybe a little programming, algebra, calculus and probability, but only a little.
- Level 2 (Journeyman): Read Pattern Recognition and Machine Learning. Discover why machine learning algorithms work from a maths perspective. Interpret and debug the output of machine learning methods and have knowledge of deeper machine learning concepts. Assumes working knowledge of algorithms, good linear algebra, some vector calculus, some algorithm implementation experience.
- Level 3 (Master): Read Probabilistic Graphical Models: Principles and Techniques. Go deep into advanced topics like convex optimization, combinatorial optimization, probability theory, differential geometry, and other maths. Get good at probabilistic graphical models, when to use them and how to interpret their results.
- Level 4 (Grandmaster): Take on whatever you like. Give back to the community.
It’s a nice breakdown, and Colorado provides specific chapter suggestions for each level as well as a suggested capstone project.
Colorado reposted this roadmap as a blog post with some minor modifications. He dropped the last level and changed the names to: Curious, Neophyte, Apprentice, Journeyman and Master. He also comments that the Level 0 Curious machine learner should not read a textbook, but instead should browse and review some top machine learning videos.
Topics Neglected in Machine Learning
Scott comments that the suggestions demonstrate Colorado preferences and do not give a fuller picture of the field of machine learning. Scott also comments that few if any books give a good overview of the field, although he does like the book Machine Learning: The Art and Science of Algorithms that Make Sense of Data by Peter Flach because it also touches on some obscure techniques.
Scott then goes on to list some areas that are “egregiously neglected in books”. In summary, the areas are:
- Online learning: Critical for streaming data and big data, with a nod to Vowpal Wabbit.
- Reinforcement learning: discussed in the context of robotics, but little linking it back into common ML.
- “Compression” sequence prediction techniques: Compression while discovering learning patterns. A nod to CompLearn.
- Time series oriented techniques in general
- Conformal prediction: Model accuracy estimation for online learning.
- ML in the presence of lots of noise: NLP and CV are general case examples.
- Feature engineering: Absolutely vital to successful machine learning.
- Unsupervised and semi-supervised learning in general
It’s a great list, pointing out some areas that indeed do not get much or enough attention.
I’ll note that I have my own roadmap for getting started and mastering machine learning. Like Colorado, my roadmap is constrained to classification/regression type supervised machine learning, but builds in processes that promote the investigation and adoption of any and all topics of interest. Rather than a “read these textbooks“, it a “follow these processes” approach.