Why would you ever implement machine learning algorithms from scratch when there are so many provided in existing APIs?
This is a great question. One that must be considered before you write that first line of code.
In this post you will discover a variety of interesting and even thought-provoking answers to this question.
The answers in this post are summarized from the Quora question titled: “Why is there a need to manually implement machine learning algorithms when there are many advanced APIs like tensorflow available?“.
Kick-start your project with my new book Machine Learning Algorithms From Scratch, including step-by-step tutorials and the Python source code files for all examples.
Two Major Reasons To (re)Implement an Algorithm
I think that all of the answers can be broken down into two camps:
- Self Study where an algorithm is implemented as a learning exercise.
- Operational Requirements where an algorithm is implemented to meet needs of a production system.
Implement Algorithms for Self-Study
Charles Gee gives a great answer from the self-study perspective. He comments:
… suppose that instead of talking about machine learning algorithms, we were talking about sorting algorithms. Sure, many data structures have a sort function that requires little to no coding, but would you really hire a programmer who couldn’t do a bubblesort? selectionsort? insertionsort? mergesort? quicksort? binary search tree?
Charles describes 4 different use cases where it may be highly desirable to implement a machine learning algorithm from scratch:
- As a beginner in the machine learning realm.
- As a researcher in the machine learning realm.
- As a teacher in the machine learning realm.
- As a user of these machine learning algorithms.
Implement Algorithms for Operational Requirements
Xavier Amatriain focuses on this topic in his answer. He comments:
Let me start by saying that I do believe that any team should default to re-using existing implementations. … However, there are also many reasons why a company may decide implement their own version of an ML algorithm.
Xavier lists 5 reasons to implement a machine learning algorithm, as follows:
- Performance. Open source implementations may be too general and not efficient enough for specific use cases.
- Correctness. There may be bugs or limitations in the open source implementations for specific use cases (such as larger scale datasets).
- Programming Language. Implementations may be limited to specific programming languages.
- Integration. There may be a need to integrate an algorithm implementation into the infrastructure of existing production systems.
- Licensing. There may be limitations imposed by the choice of open source license.
In this post, you discovered that there are two major reasons why you might want to implement an algorithm from scratch.
- To learn more about how the algorithm works for self-study.
- To customize the implementation of the algorithm for a production system.
I have posted a number of times on the benefits of implementing machine learning algorithms from scratch.
Some further reading on this topic includes:
- Benefits of Implementing Machine Learning Algorithms From Scratch
- Understand Machine Learning Algorithms By Implementing Them From Scratch (and tactics to get around bad code)
- Don’t Start with Open-Source Code When Implementing Machine Learning Algorithms
- How to Implement a Machine Learning Algorithm