Last Updated on August 12, 2019
Implementing a machine learning algorithm in code can teach you a lot about the algorithm and how it works.
In this post you will learn how to be effective at implementing machine learning algorithms and how to maximize your learning from these projects.
Kick-start your project with my new book Master Machine Learning Algorithms, including step-by-step tutorials and the Excel Spreadsheet files for all examples.
Let’s get started.
Benefits of Implementing Machine Learning Algorithms
You can use the implementation of machine learning algorithms as a strategy for learning about applied machine learning. You can also carve out a niche and skills in algorithm implementation.
Implementing a machine learning algorithm will give you a deep and practical appreciation for how the algorithm works. This knowledge can also help you to internalize the mathematical description of the algorithm by thinking of the vectors and matrices as arrays and the computational intuitions for the transformations on those structures.
There are numerous micro-decisions required when implementing a machine learning algorithm and these decisions are often missing from the formal algorithm descriptions. Learning and parameterizing these decisions can quickly catapult you to intermediate and advanced level of understanding of a given method, as relatively few people make the time to implement some of the more complex algorithms as a learning exercise.
Get your FREE Algorithms Mind Map
I've created a handy mind map of 60+ algorithms organized by type.
Download it, print it and use it.
Also get exclusive access to the machine learning algorithms email mini-course.
You are developing valuable skills when you implement machine learning algorithms by hand. Skills such as mastery of the algorithm, skills that can help in the development of production systems and skills that can be used for classical research in the field.
Three examples of skills you can develop are listed include:
- Mastery: Implementation of an algorithm is the first step towards mastering the algorithm. You are forced to understand the algorithm intimately when you implement it. You are also creating your own laboratory for tinkering to help you internalize the computation it performs over time, such as by debugging and adding measures for assessing the running process.
- Production Systems: Custom implementations of algorithms are typically required for production systems because of the changes that need to be made to the algorithm for efficiency and efficacy reasons. Better, faster, less resource intensive results ultimately can lead to lower costs and greater revenue in business, and implementing algorithms by hand help you develop the skills to deliver these solutions.
- Literature Review: When implementing an algorithm you are performing research. You are forced to locate and read multiple canonical and formal descriptions of the algorithm. You are also likely to locate and code review other implementations of the algorithm to confirm your understandings. You are performing targeted research, and learning how to read and make practical use of research publications.
There is a process you can follow to accelerate your ability to learn and implement a machine learning algorithm by hand from scratch. The more algorithms you implement, the faster and more efficient you get at it and the more you will develop and customize your own process.
You can use the process outlined below.
- Select programming language: Select the programming language you want to use for the implementation. This decision may influence the APIs and standard libraries you can use in your implementation.
- Select Algorithm: Select the algorithm that you want to implement from scratch. Be as specific as possible. This means not only the class, and type of algorithm, but also go as far as selecting a specific description or implementation that you want to implement.
- Select Problem: Select a canonical problem or set of problems you can use to test and validate your implementation of the algorithm. Machine learning algorithms do not exist in isolation.
- Research Algorithm: Locate papers, books, websites, libraries and any other descriptions of the algorithm you can read and learn from. Although, you ideally want to have one keystone description of the algorithm from which to work, you will want to have multiple perspectives on the algorithm. This is useful because the multiple perspectives will help you to internalize the algorithm description faster and overcome roadblocks from any ambiguities or assumptions made in the description (there are always ambiguities in algorithm descriptions).
- Unit Test: Write unit tests for each function, even consider test driven development from the beginning of the project so that you are forced to understand the purpose and expectations of each unit of code before you implement them.
I strongly suggest porting algorithms from one language to another as a way of making rapid progress along this path. You can find plenty of open source implementations of algorithms that you can code review, diagram, internalize and reimplement in another language.
Consider open sourcing your code while you are developing it and after you have developed it. Comment it well and ensure it provides instructions on how to build and use it. The project will provide marketing for the skills you are developing and may just provide inspiration and help for someone else looking to make their start in machine learning. You may even be lucky enough to find a fellow programmer sufficiently interested to perform an audit or code review for you. Any feedback you get will be invaluable (even as motivation), actively seek it.
Once you have implemented an algorithm you can explore making improvements to the implementation. Some examples of improvements you could explore include:
- Experimentation: You can expose many of the micro-decisions you made in the algorithms implementation as parameters and perform studies on variations of those parameters. This can lead to new insights and disambiguation of algorithm implementations that you can share and promote.
- Optimization: You can explore opportunities to make the implementation more efficient by using tools, libraries, different languages, different data structures, patterns and internal algorithms. Knowledge you have of algorithms and data structures for classical computer science can be very beneficial in this type of work.
- Specialization: You may explore ways of making the algorithm more specific to a problem. This can be required when creating production systems and is a valuable skill. Making an algorithm more problem specific can also lead to increases in efficiency (such as running time) and efficacy (such as accuracy or other performance measures).
- Generalization: Opportunities can be created by making a specific algorithm more general. Programmers (like mathematicians) are uniquely skilled in abstraction and you may be able to see how the algorithm could be applied to more general cases of a class of problem or other problems entirely.
You can learn a lot by implementing machine learning algorithms by hand, but there are also some downsides to keep in mind.
- Redundancy: Many algorithms already have implementations, some very robust implementations that have been used by hundreds or thousands of researchers and practitioners around the world. Your implementation may be considered redundant, a duplication of effort already invested by the community.
- Bugs: New code that has few users is more likely to have bugs, even with a skilled programmer and unit tests. Using a standard library can reduce the likelihood of having bugs in the algorithm implementation.
- Non-intuitive Leaps: Some algorithms rely on non-intuitive jumps in reasoning or logic because of the sophisticated mathematics involved. It is feasible that an implementation that does not appreciate these leaps to be limited or even incorrect.
It is easy to comment on open source implementations of machine learning algorithms and raise many issues in a code review. It is much harder to appreciate the non-intuitive efficiencies that have been encoded in the implementation. This can be a trap in thinking.
You may find it beneficial to start with a slower intuitive implementation of a complex algorithm before considering how to change it to be programmatically less elegant, but computationally more efficient.
Some algorithms are easier to understand than others. In this post I want to make some suggestions for intuitive algorithms from which you might like to select your first machine learning algorithm to implement from scratch.
- Ordinary Least Squares Linear Regression: Use two dimensional data sets and model x from y. Print out the error for each iteration of the algorithm. Consider plotting the line of best fit and predictions for each iteration of the algorithm to see how the updates affect the model.
- k-Nearest Neighbor: Consider using two dimensional data sets with 2 classes even ones that you create with graph paper so that you can plot them. Once you can plot and make predictions, you can plot the relationships created for each prediction decision the model makes.
- Perceptron: Considered the simplest artificial neural network model and very similar to a regression model. You can track and graph the performance of the model as it learns a dataset.
In this post you learned the benefits of implementing machine learning algorithms by hand. You learned that you can understand an algorithm, make improvements and develop valuable skills by following this path.
You learned a simple process that you can follow and customize as you implement multiple algorithms from scratch and you learned three algorithms that you could choose as your first algorithm to implement from scratch.
Thanks once again.
Awesome post – great ideas for moving through the process. Thank you for keeping us encouraged and for helping the community with your ideas and approaches. VERY HELPFUL.
I am already subscribed but I stumbled back onto your site when googling for “writing test cases for machine learning code”. You mentioned above that writing unit tests is a key part of implementing a machine learning algorithm. So my question is whether you think there are any special considerations for writing unit tests that beyond those that apply for programming in general.
As a starting point, you must confirm the correctness of the implementation. Lots of small functions in the implementation will mean it’s easier to write specific functional tests. For example, I remember implementing a lot of linalg in fortran with lapack, preparing test I/O in octave and reproducing the results with unit tests and my code in fortran.
Broader integration tests may require random number seeding for reproducibility and probabilistic output confirmation. E.g. are my gaussian random number generators really gaussian by looking at means and stdevs of 1000 samples, etc.
For production systems, I think you must also have automatic system tests to confirm skill. This may mean retraining on a well understood training set and evaluating on a test set and confirming an expected result (probabilistically). It may also mean some kind of ratchet of performance, e.g. test that performance does not drop below x across validation tests sets where x continues to increase as the model is refined.
I hope that helps as a start.
Thank you for this article.
Do you know about SVMs(Support vector machine)?
I think these algorithm that can have some space in the Map.
Yes Ping, I left it off because it did not fit in neatly. I need to update the map to include it.
Hello everyone I am a student of NIT Raipur and currently implementing a project which would showcase a virtual tour of my college… So is there any way in which I could involve machine learning algorithms to implement virtual tour..?
Perhaps a chat bot that gives contextual-based commentary?
Great article. Learning an algorithm from scratch is the equivalent of tinkering or disarming an electronic device to see how it works internally. It is fun and very instructive! It also dissipates the mystery halo that surrounds them when we use an off-the-shelf implementation that comes in a library. Then, we can make better, more informed decisions!
Regarding unit testing in machine learning: What should we test? Given that the output of a model is non-deterministic due to many random factors (initialization, the order in the data, etc), what should we check? That the error is within some bounds? That the structure of the network (in case we are in deep learning territory) is the one we expect? I’d love to know your take on this.
Thanks in advance for your time and attention!
Try and test functions or modules first.
For the algorithm, perhaps use a small well defined linear problem/s that can be solved and test for the solution or solution with in tolerance.
Nice article for a beginner like me…
I’m glad it helped.
Please i am a student and i was ask to detect fraud in cooperative societies using machine learning, please how can i carry out this project because i know nothing as machine learning is concern?
I would recommend start by clearly defining the problem:
if a person is working as technical support engineer and learning machine learning and data analytics…and have potential to lean all this…is there any chance that he can make his career in data science
Excellent article, ml algorithms must be well thought in order to solve the problem effectively.
Thanks for you knowledge sharing. 🙂
Best from BRazl
I’m happy it helped.
Amazing article. I was actually in process of implementing an algorithm when I read this. I love implementing and mastering ml algorithms but my problem is it consumes a lot more time, sometimes 2-3 days if I get stuck around a concept. My doubt was how to deal with this delay in mastering the algorithm when you know it’s taking a lot of time. Also is it okay to spend a day or two in mastering the algorithm.
Thanks, I’m glad it helped!
Awsome Article!!! even though my core in PG is Data Science am struggling how to learn and impliment ML r Deeplearning algorithms in Python.I can utilise for my learning process.Thanx alot.
Thanks, I’m glad it helps.
Thank you dr, for helping us the practical and some theoretical aspects of the algorithm
I want to ask about production systems what do they mean? and what do you mean by “Better, faster, less resource-intensive results ultimately can lead to lower costs and greater revenue in business, and implementing algorithms by hand help you develop the skills to deliver these solutions” ?
Production Systems: Custom implementations of algorithms are typically required for production systems because of the changes that need to be made to the algorithm for efficiency and efficacy reasons. Better, faster, less resource-intensive results ultimately can lead to lower costs and greater revenue in business, and implementing algorithms by hand help you develop the skills to deliver these solutions
Production means the systems used operationally within a business to solve a problem. E.g. not a toy problem or a kaggle competition.
Hi Jason. Great article. It’s interesting to see you argue both sides of the same issue with equal efficacy. I’ve built gradient based regression and classification package and I’m in the process of writing a tutorial blog that walks the reader through the implementation, experimentation, and application aspects. I’ve read ‘Machine Learning Algorithms from Scratch’, which is a great resource. Can you point to any other ‘tutorial’ blogs that might serve as a model? Ideally, the blog would
– present the theory in an innovative and captivating way
– integrate code and text in a way that keeps reader’s attention
– presents intriguing experiment results
– applies the algorithm to a real-world dataset or problem
Thanks in advance
Good question. If you mean pedagogical theory, then I can’t help you.
I developed my own style over the last 6 years.
You can learn from examples, perhaps start with some recent examples in computer vision or GANs:
I am already subscribed but I posted my questions three times but no response. Dear Sir, I like the way you explain the algorithms. I need to implement NSGA-III having crowding distance and tournament concept and Pareto front in it.
Sorry, I don’t have tutorials on this topic.
Hi, can you help me please implement an algorithm? I am having difficulties to understand the algorithm from the article.
Sorry, I don’t have the capacity to implement an algorithm for you.
Good one, today I could learn the basics of Machine Learning. Thanks
As i want to make a recommender system for y PG research, which of the machine learning techniques do i use
A kNN model is a great place to start: