Implementing a machine learning algorithm in code can teach you a lot about the algorithm and how it works.
In this post you will learn how to be effective at implementing machine learning algorithms and how to maximize your learning from these projects.
This is an excerpt from my popular new guide Small Projects Methodology: Learn and Practice Applied Machine Learning.
Benefits of Implementing Machine Learning Algorithms
You can use the implementation of machine learning algorithms as a strategy for learning about applied machine learning. You can also carve out a niche and skills in algorithm implementation.
Implementing a machine learning algorithm will give you a deep and practical appreciation for how the algorithm works. This knowledge can also help you to internalize the mathematical description of the algorithm by thinking of the vectors and matrices as arrays and the computational intuitions for the transformations on those structures.
There are numerous micro-decisions required when implementing a machine learning algorithm and these decisions are often missing from the formal algorithm descriptions. Learning and parameterizing these decisions can quickly catapult you to intermediate and advanced level of understanding of a given method, as relatively few people make the time to implement some of the more complex algorithms as a learning exercise.
You are developing valuable skills when you implement machine learning algorithms by hand. Skills such as mastery of the algorithm, skills that can help in the development of production systems and skills that can be used for classical research in the field.
Three examples of skills you can develop are listed include:
- Mastery: Implementation of an algorithm is the first step towards mastering the algorithm. You are forced to understand the algorithm intimately when you implement it. You are also creating your own laboratory for tinkering to help you internalize the computation it performs over time, such as by debugging and adding measures for assessing the running process.
- Production Systems: Custom implementations of algorithms are typically required for production systems because of the changes that need to be made to the algorithm for efficiency and efficacy reasons. Better, faster, less resource intensive results ultimately can lead to lower costs and greater revenue in business, and implementing algorithms by hand help you develop the skills to deliver these solutions.
- Literature Review: When implementing an algorithm you are performing research. You are forced to locate and read multiple canonical and formal descriptions of the algorithm. You are also likely to locate and code review other implementations of the algorithm to confirm your understandings. You are performing targeted research, and learning how to read and make practical use of research publications.
There is a process you can follow to accelerate your ability to learn and implement a machine learning algorithm by hand from scratch. The more algorithms you implement, the faster and more efficient you get at it and the more you will develop and customize your own process.
You can use the process outlined below.
- Select programming language: Select the programming language you want to use for the implementation. This decision may influence the APIs and standard libraries you can use in your implementation.
- Select Algorithm: Select the algorithm that you want to implement from scratch. Be as specific as possible. This means not only the class, and type of algorithm, but also go as far as selecting a specific description or implementation that you want to implement.
- Select Problem: Select a canonical problem or set of problems you can use to test and validate your implementation of the algorithm. Machine learning algorithms do not exist in isolation.
- Research Algorithm: Locate papers, books, websites, libraries and any other descriptions of the algorithm you can read and learn from. Although, you ideally want to have one keystone description of the algorithm from which to work, you will want to have multiple perspectives on the algorithm. This is useful because the multiple perspectives will help you to internalize the algorithm description faster and overcome roadblocks from any ambiguities or assumptions made in the description (there are always ambiguities in algorithm descriptions).
- Unit Test: Write unit tests for each function, even consider test driven development from the beginning of the project so that you are forced to understand the purpose and expectations of each unit of code before you implement them.
I strongly suggest porting algorithms from one language to another as a way of making rapid progress along this path. You can find plenty of open source implementations of algorithms that you can code review, diagram, internalize and reimplement in another language.
Consider open sourcing your code while you are developing it and after you have developed it. Comment it well and ensure it provides instructions on how to build and use it. The project will provide marketing for the skills you are developing and may just provide inspiration and help for someone else looking to make their start in machine learning. You may even be lucky enough to find a fellow programmer sufficiently interested to perform an audit or code review for you. Any feedback you get will be invaluable (even as motivation), actively seek it.
Once you have implemented an algorithm you can explore making improvements to the implementation. Some examples of improvements you could explore include:
- Experimentation: You can expose many of the micro-decisions you made in the algorithms implementation as parameters and perform studies on variations of those parameters. This can lead to new insights and disambiguation of algorithm implementations that you can share and promote.
- Optimization: You can explore opportunities to make the implementation more efficient by using tools, libraries, different languages, different data structures, patterns and internal algorithms. Knowledge you have of algorithms and data structures for classical computer science can be very beneficial in this type of work.
- Specialization: You may explore ways of making the algorithm more specific to a problem. This can be required when creating production systems and is a valuable skill. Making an algorithm more problem specific can also lead to increases in efficiency (such as running time) and efficacy (such as accuracy or other performance measures).
- Generalization: Opportunities can be created by making a specific algorithm more general. Programmers (like mathematicians) are uniquely skilled in abstraction and you may be able to see how the algorithm could be applied to more general cases of a class of problem or other problems entirely.
You can learn a lot by implementing machine learning algorithms by hand, but there are also some downsides to keep in mind.
- Redundancy: Many algorithms already have implementations, some very robust implementations that have been used by hundreds or thousands of researchers and practitioners around the world. Your implementation may be considered redundant, a duplication of effort already invested by the community.
- Bugs: New code that has few users is more likely to have bugs, even with a skilled programmer and unit tests. Using a standard library can reduce the likelihood of having bugs in the algorithm implementation.
- Non-intuitive Leaps: Some algorithms rely on non-intuitive jumps in reasoning or logic because of the sophisticated mathematics involved. It is feasible that an implementation that does not appreciate these leaps to be limited or even incorrect.
It is easy to comment on open source implementations of machine learning algorithms and raise many issues in a code review. It is much harder to appreciate the non-intuitive efficiencies that have been encoded in the implementation. This can be a trap in thinking.
You may find it beneficial to start with a slower intuitive implementation of a complex algorithm before considering how to change it to be programmatically less elegant, but computationally more efficient.
Some algorithms are easier to understand than others. In this post I want to make some suggestions for intuitive algorithms from which you might like to select your first machine learning algorithm to implement from scratch.
- Ordinary Least Squares Linear Regression: Use two dimensional data sets and model x from y. Print out the error for each iteration of the algorithm. Consider plotting the line of best fit and predictions for each iteration of the algorithm to see how the updates affect the model.
- k-Nearest Neighbor: Consider using two dimensional data sets with 2 classes even ones that you create with graph paper so that you can plot them. Once you can plot and make predictions, you can plot the relationships created for each prediction decision the model makes.
- Perceptron: Considered the simplest artificial neural network model and very similar to a regression model. You can track and graph the performance of the model as it learns a dataset.
In this post you learned the benefits of implementing machine learning algorithms by hand. You learned that you can understand an algorithm, make improvements and develop valuable skills by following this path.
You learned a simple process that you can follow and customize as you implement multiple algorithms from scratch and you learned three algorithms that you could choose as your first algorithm to implement from scratch.
If you like this self-study strategy, I have created a 32-page PDF guide you can learn and practice applied machine learning. Check it out:
I have also created a list of 90 project ideas (yeah, I went overboard) and provided it as a bonus with the guide.