Implementing a machine learning algorithm in code can teach you a lot about the algorithm and how it works.

In this post you will learn how to be effective at implementing machine learning algorithms and how to maximize your learning from these projects.

## Benefits of Implementing Machine Learning Algorithms

You can use the implementation of machine learning algorithms as a strategy for learning about applied machine learning. You can also carve out a niche and skills in algorithm implementation.

### Algorithm Understanding

Implementing a machine learning algorithm will give you a deep and practical appreciation for how the algorithm works. This knowledge can also help you to internalize the mathematical description of the algorithm by thinking of the vectors and matrices as arrays and the computational intuitions for the transformations on those structures.

There are numerous micro-decisions required when implementing a machine learning algorithm and these decisions are often missing from the formal algorithm descriptions. Learning and parameterizing these decisions can quickly catapult you to intermediate and advanced level of understanding of a given method, as relatively few people make the time to implement some of the more complex algorithms as a learning exercise.

## Get your FREE Algorithms Mind Map

I've created a handy mind map of 60+ algorithms organized by type.

Download it, print it and use it.

Also get exclusive access to the machine learning algorithms email mini-course.

### Practical Skills

You are developing valuable skills when you implement machine learning algorithms by hand. Skills such as mastery of the algorithm, skills that can help in the development of production systems and skills that can be used for classical research in the field.

Three examples of skills you can develop are listed include:

**Mastery**: Implementation of an algorithm is the first step towards mastering the algorithm. You are forced to understand the algorithm intimately when you implement it. You are also creating your own laboratory for tinkering to help you internalize the computation it performs over time, such as by debugging and adding measures for assessing the running process.**Production Systems**: Custom implementations of algorithms are typically required for production systems because of the changes that need to be made to the algorithm for efficiency and efficacy reasons. Better, faster, less resource intensive results ultimately can lead to lower costs and greater revenue in business, and implementing algorithms by hand help you develop the skills to deliver these solutions.**Literature Review**: When implementing an algorithm you are performing research. You are forced to locate and read multiple canonical and formal descriptions of the algorithm. You are also likely to locate and code review other implementations of the algorithm to confirm your understandings. You are performing targeted research, and learning how to read and make practical use of research publications.

## Process

There is a process you can follow to accelerate your ability to learn and implement a machine learning algorithm by hand from scratch. The more algorithms you implement, the faster and more efficient you get at it and the more you will develop and customize your own process.

You can use the process outlined below.

**Select programming language**: Select the programming language you want to use for the implementation. This decision may influence the APIs and standard libraries you can use in your implementation.**Select Algorithm**: Select the algorithm that you want to implement from scratch. Be as specific as possible. This means not only the class, and type of algorithm, but also go as far as selecting a specific description or implementation that you want to implement.**Select Problem:**Select a canonical problem or set of problems you can use to test and validate your implementation of the algorithm. Machine learning algorithms do not exist in isolation.**Research Algorithm**: Locate papers, books, websites, libraries and any other descriptions of the algorithm you can read and learn from. Although, you ideally want to have one keystone description of the algorithm from which to work, you will want to have multiple perspectives on the algorithm. This is useful because the multiple perspectives will help you to internalize the algorithm description faster and overcome roadblocks from any ambiguities or assumptions made in the description (there are always ambiguities in algorithm descriptions).**Unit Test**: Write unit tests for each function, even consider test driven development from the beginning of the project so that you are forced to understand the purpose and expectations of each unit of code before you implement them.

I strongly suggest porting algorithms from one language to another as a way of making rapid progress along this path. You can find plenty of open source implementations of algorithms that you can code review, diagram, internalize and reimplement in another language.

Consider open sourcing your code while you are developing it and after you have developed it. Comment it well and ensure it provides instructions on how to build and use it. The project will provide marketing for the skills you are developing and may just provide inspiration and help for someone else looking to make their start in machine learning. You may even be lucky enough to find a fellow programmer sufficiently interested to perform an audit or code review for you. Any feedback you get will be invaluable (even as motivation), actively seek it.

## Extensions

Once you have implemented an algorithm you can explore making improvements to the implementation. Some examples of improvements you could explore include:

**Experimentation**: You can expose many of the micro-decisions you made in the algorithms implementation as parameters and perform studies on variations of those parameters. This can lead to new insights and disambiguation of algorithm implementations that you can share and promote.**Optimization**: You can explore opportunities to make the implementation more efficient by using tools, libraries, different languages, different data structures, patterns and internal algorithms. Knowledge you have of algorithms and data structures for classical computer science can be very beneficial in this type of work.**Specialization**: You may explore ways of making the algorithm more specific to a problem. This can be required when creating production systems and is a valuable skill. Making an algorithm more problem specific can also lead to increases in efficiency (such as running time) and efficacy (such as accuracy or other performance measures).**Generalization**: Opportunities can be created by making a specific algorithm more general. Programmers (like mathematicians) are uniquely skilled in abstraction and you may be able to see how the algorithm could be applied to more general cases of a class of problem or other problems entirely.

## Limitations

You can learn a lot by implementing machine learning algorithms by hand, but there are also some downsides to keep in mind.

**Redundancy**: Many algorithms already have implementations, some very robust implementations that have been used by hundreds or thousands of researchers and practitioners around the world. Your implementation may be considered redundant, a duplication of effort already invested by the community.**Bugs**: New code that has few users is more likely to have bugs, even with a skilled programmer and unit tests. Using a standard library can reduce the likelihood of having bugs in the algorithm implementation.**Non-intuitive Leaps**: Some algorithms rely on non-intuitive jumps in reasoning or logic because of the sophisticated mathematics involved. It is feasible that an implementation that does not appreciate these leaps to be limited or even incorrect.

It is easy to comment on open source implementations of machine learning algorithms and raise many issues in a code review. It is much harder to appreciate the non-intuitive efficiencies that have been encoded in the implementation. This can be a trap in thinking.

You may find it beneficial to start with a slower intuitive implementation of a complex algorithm before considering how to change it to be programmatically less elegant, but computationally more efficient.

## Example Projects

Some algorithms are easier to understand than others. In this post I want to make some suggestions for intuitive algorithms from which you might like to select your first machine learning algorithm to implement from scratch.

**Ordinary Least Squares Linear Regression**: Use two dimensional data sets and model x from y. Print out the error for each iteration of the algorithm. Consider plotting the line of best fit and predictions for each iteration of the algorithm to see how the updates affect the model.**k-Nearest Neighbor**: Consider using two dimensional data sets with 2 classes even ones that you create with graph paper so that you can plot them. Once you can plot and make predictions, you can plot the relationships created for each prediction decision the model makes.**Perceptron**: Considered the simplest artificial neural network model and very similar to a regression model. You can track and graph the performance of the model as it learns a dataset.

## Summary

In this post you learned the benefits of implementing machine learning algorithms by hand. You learned that you can understand an algorithm, make improvements and develop valuable skills by following this path.

You learned a simple process that you can follow and customize as you implement multiple algorithms from scratch and you learned three algorithms that you could choose as your first algorithm to implement from scratch.

Thanks once again.

Keep up.

Thanks Surajit!

Awesome post – great ideas for moving through the process. Thank you for keeping us encouraged and for helping the community with your ideas and approaches. VERY HELPFUL.

Thanks Joshua.

I am already subscribed but I stumbled back onto your site when googling for “writing test cases for machine learning code”. You mentioned above that writing unit tests is a key part of implementing a machine learning algorithm. So my question is whether you think there are any special considerations for writing unit tests that beyond those that apply for programming in general.

Absolutely.

As a starting point, you must confirm the correctness of the implementation. Lots of small functions in the implementation will mean it’s easier to write specific functional tests. For example, I remember implementing a lot of linalg in fortran with lapack, preparing test I/O in octave and reproducing the results with unit tests and my code in fortran.

Broader integration tests may require random number seeding for reproducibility and probabilistic output confirmation. E.g. are my gaussian random number generators really gaussian by looking at means and stdevs of 1000 samples, etc.

For production systems, I think you must also have automatic system tests to confirm skill. This may mean retraining on a well understood training set and evaluating on a test set and confirming an expected result (probabilistically). It may also mean some kind of ratchet of performance, e.g. test that performance does not drop below x across validation tests sets where x continues to increase as the model is refined.

I hope that helps as a start.

Thank you for this article.

Do you know about SVMs(Support vector machine)?

I think these algorithm that can have some space in the Map.

Yes Ping, I left it off because it did not fit in neatly. I need to update the map to include it.

Hello everyone I am a student of NIT Raipur and currently implementing a project which would showcase a virtual tour of my college… So is there any way in which I could involve machine learning algorithms to implement virtual tour..?

Perhaps a chat bot that gives contextual-based commentary?

Great article. Learning an algorithm from scratch is the equivalent of tinkering or disarming an electronic device to see how it works internally. It is fun and very instructive! It also dissipates the mystery halo that surrounds them when we use an off-the-shelf implementation that comes in a library. Then, we can make better, more informed decisions!

Regarding unit testing in machine learning: What should we test? Given that the output of a model is non-deterministic due to many random factors (initialization, the order in the data, etc), what should we check? That the error is within some bounds? That the structure of the network (in case we are in deep learning territory) is the one we expect? I’d love to know your take on this.

Thanks in advance for your time and attention!

Good question.

Try and test functions or modules first.

For the algorithm, perhaps use a small well defined linear problem/s that can be solved and test for the solution or solution with in tolerance.