Where does theory fit into a top-down approach to studying machine learning?

In the traditional approach to teaching machine learning, theory comes first requiring an extensive background in mathematics to be able to understand it. In my approach to teaching machine learning, I start with teaching you how to work problems end-to-end and deliver results.

So where does the theory fit?

In this post you will discover what we really mean when we talk about “theory” in machine learning. Hint: it’s all about the algorithms.

You will discover that once you get skilled at working through problems and delivering results, you will develop a compulsion to dive deeper in order to better understanding and results. **Nobody will be able to hold you back**.

Finally, you will discover 5 techniques that you can use when you are practicing machine learning on standard datasets to incrementally build up your understanding of machine learning algorithms.

## Learn Theory Last, Not First

The way machine learning is taught to developers is crap.

It is taught bottom-up. This is crap if you are a developer who is primarily interested in using machine learning as a tool to solve problems rather than being a researcher in the field.

The traditional approach requires that you learn all of the prerequisite mathematics like linear algebra, probability and statistics before learning the theory of algorithms. You’re lucky if you ever go near a working implementation of an algorithm or discuss how to work a problem end-to-end and deliver a working, reliable and accurate predictive model.

I teach a top-down approach to learning machine learning. In this approach we start with 1) learning a systematic process for working through problems end-to-end, 2) map the process onto “best of breed” machine learning tools and platforms then 3) complete targeted practice on test datasets.

You can learn more about my approach to teaching top-down machine learning in the post “Machine Learning for Programmers: Leap from developer to machine learning practitioner“.

So where does theory fit into this process?

If the model is flipped, then theory is taught later. But what theory are we talking about and how exactly do you learn that theory when you are practicing on test datasets?

## Get your FREE Algorithms Mind Map

I've created a handy mind map of 60+ algorithms organized by type.

Download it, print it and use it.

Also get exclusive access to the machine learning algorithms email mini-course.

## The Theory is Really All About Algorithms

The field of machine learning is theory-dense.

It’s dense because there is a tradition to describe and explain concepts mathematically.

This is useful because mathematical descriptions can be very concise, cutting down on the ambiguity. They also lend themselves to analysis by leveraging the techniques from the context in which they are described (e.g. a probabilistic understanding of a process).

A lot of these tangential mathematical techniques are often bundled in with the description of machine learning algorithms. For someone who just wants to build a superficial understanding of a method to be able to configure and apply it, this feels overwhelming. Frustratingly so.

It is frustrating if you do not have the grounding to be able to parse and understand the description of an algorithm. It’s frustrating because coming from a field like computer science, algorithms are described all the time, but the difference is the descriptions are intended for fast comprehension (e.g. for desk checking) and implementation.

We know that for example when learning what a hash table is and how to use it, that we almost never need to know the specifics of the hashing function in our day-to-day. But we also know what a hashing function is and where to go to learn more about hashing function specifics and how to write your own. Why can’t machine learning work like that?

The bulk of the “theory” one encounters in machine learning is related to machine learning algorithms. If you ask any beginner about why they are frustrated with the theory, you will learn that it is in relation to learning how to understand or use a specific machine learning algorithm.

Here, algorithms is more broad than a process for creating a predictive model. It also refers to algorithms for selecting features, engineering new features, transforming data and estimating the accuracy of a model on unseen data (e.g. cross validation).

So, learning theory last, really means learning about machine learning algorithms.

## A Compulsion To Dive Into Theory

I generally advise targeted practice on well known machine learning datasets.

This is because well known machine learning dataset, like those on the UCI Machine Learning Repository are easy to work with. They are small so they fit into memory and can be processed on your workstation. They are also well studied and understood so you have a baseline for comparison.

You can learn more about targeted practice of machine learning datasets in the post “Practice Machine Learning with Small In-Memory Datasets from the UCI Machine Learning Repository“.

Understanding machine learning algorithms fits into this process. The reason is in the pursuit of getting results on standard machine learning algorithms you are going to run into limitations. You are going to want to know how to get more out of a given algorithm or to know more about how to best configure it, or how it actually works.

This need to know more and curiosity will drive you into studying the theory of machine learning algorithms. You will be compelled to piece together an understand of the algorithms in order to achieve better results.

We see this same effect in young developers from varied backgrounds that end up eventually studying the code of open source projects, textbooks and even research papers in order to hone their craft. The need to being a better more capable programmer drives them to it.

If you are curious and motivated to succeed, you cannot resist studying the theory.

## 5 Techniques To Understand Machine Learning Algorithms

The time will come to dive into machine learning algorithms as part of your targeted practice

When that time comes, there are a number of techniques and template that you can use to short cut the process.

In this section you will discover 5 techniques that you can use to understand the theory of machine learning algorithms, fast.

### 1) Create Lists of Machine Learning Algorithms

When you are just starting out you may feel overwhelmed by the larger number of algorithms available.

Even when spot testing algorithms, you may be unsure of which algorithms to include in your mix (hint, be diverse).

An excellent trick you can use when starting out is to keep track of the algorithms you read about. These lists can be as simple as the name of the algorithm, and can increase in complexity as you interest and curiosity build.

Capture details like the problem type to which they are suited (classification or regression), related algorithms, and taxonomic class (decision tree, kernel, etc.). When you see the name of an algorithm that is new to you, add it to your list. When you start a new problem, try some algorithms you have never used before. Mark a check next to algorithms you have used before. And so on.

Controlling the names of algorithms in lists gives you power. This ridiculously simple tactic can help you get on top of the overwhelm. Examples of where your simple algorithm lists can save you a lot of time and frustration are:

- Ideas of algorithms to try on new and different problem types (time series, rating systems, etc.)
- Algorithms that you can investigate to learn more about how to apply.
- Get a handle on algorithm types by category (trees, kernels, etc.).
- Avoid the problem of fixating on a favorite algorithm.

Start by creating lists of algorithms, open a spreadsheet and get started.

See the post “Take Control By Creating Targeted Lists of Machine Learning Algorithms” for more information on this tactic.

### 2) Research Machine Learning Algorithms

When you want to know more about a machine learning algorithm you need to research it.

The main reasons you will be interested to research an algorithm is to learn how to configure it and to learn how it works.

Research is not just for academics. A few simple tips can take you a long way in gathering information on a given machine learning algorithm.

The key is diversity of information sources. The following is a short list of the types of sources you can consult for information on an algorithm you are researching.

- Authoritative sources like textbooks, lecture notes, slide and overview papers.
- Seminal sources like the papers and articles in which the algorithm was first described.
- Leading-edge sources that describe state-of-the-art extensions and experiments on the algorithm.
- Heuristic sources like those that come out of machine learning competitions, posts on Q&A websites and conference papers.
- Implementation sources such as open source code for tools and libraries, blog posts and technical reports.

You do not need to be a PhD researcher nor a machine learning algorithm expert.

Take your time and pick over many sources collecting facts on a machine learning algorithm you are trying to figure out. Focus on the practical details you can apply or understand and leave the rest.

For more information on researching machine learning algorithms see the post “How to Research a Machine Learning Algorithm“.

### 3) Create Your Own Algorithm Descriptions

Machine learning algorithm descriptions you will discover in your research will be incomplete and inconsistent.

An approach that you can use is to put together your own mini algorithm descriptions. This is another very simple and very powerful tactic.

You can design a standard algorithm description template with only those details that are useful to you in getting the most from algorithms, like algorithm usage heuristics, pseudo-code listings, parameter ranges and resource lists.

You can then use the same algorithm description template across a number of key algorithms and start to build up your own little algorithm encyclopedia that you can refer to on future projects.

Some questions you might like to use in your own algorithm description template include:

- What are the standard abbreviations used for the algorithm?
- What is the objective or goal for the algorithm?
- What is the pseudo-code or flowchart description of the algorithm?
- What are the heuristics or rules of thumb for using the algorithm?
- What are useful resources for learning more about the algorithm?

You will be surprised at how useful and practical these descriptions can be. For example, I used this approach to write a book of nature-inspired algorithm descriptions that I still refer back to years later.

For more on how to create effective algorithm description templates, see the post “How to Learn a Machine Learning Algorithm“.

For more information on my book of algorithms described using a standard algorithm description template, see “Clever Algorithms: Nature-Inspired Programming Recipes“.

### 4) Investigate Algorithm Behavior

Machine learning algorithms are complex systems that are sometimes best understood by their behaviors on actual datasets.

By designing small experiments on machine learning algorithms using small datasets you can learn a lot about how an algorithm works, it’s limitations and how to configure it in ways that may transfer to exceptional results on other problems.

A simple procedure that you can use to investigate a machine learning algorithm is as follows:

- Select an algorithm that you would like to know more about (e.g. random forests).
- Identify a question about that algorithm you would like answered (e.g. the effect of the number of trees).
- Design an experiment to find an answer to that question (e.g. try different numbers of trees on a few binary classification problems and chart the relationship with classification accuracy).
- Execute the experiment and write-up your results so that you can make use of them in the future.
- Repeat the process.

This is one of the truly exciting aspects of applied machine learning, that through your own simple investigations you can achieve surprising and state of the art results.

For more information on how to study algorithms from their behavior, see the post “How To Investigate Machine Learning Algorithm Behavior“.

### 5) Implement Machine Learning Algorithms

You cannot get more intimate with a machine learning algorithm than by implementing it.

In implementing a machine learning algorithm from scratch you will be confronted with the myriad of micro-decisions that go into a given implementation. You may decide to cover some up with rules of thumb of expose them all as parameters to the user.

Below is a repeatable process that you can use to implement machine learning algorithms from scratch.

- Select a programming language, one that you are most familiar with is probably best.
- Select an algorithm to implement, start with something easy (see below for a list).
- Select a problem to test your implementation on as you develop, 2D data is good for visualizing (even in Excel).
- Research the algorithm and leverage many and diverse sources of information (e.g. read tutorials, papers, other implementations, and so on).
- Unit test the algorithm to confirm your understanding and validate the implementation.

Start small and build confidence.

For example 3 algorithms that you select as your first machine learning algorithm implementation from scratch are:

- Linear Regression using Gradient Descent
- k-Nearest Neighbor (see my tutorial in Python)
- Naive Bayes (see my tutorial in Python)

For more information on how to implement machine learning algorithms, see the post “How to Implement a Machine Learning Algorithm“.

Also see the posts:

- “Benefits of Implementing Machine Learning Algorithms From Scratch“
- “Don’t Start with Open-Source Code When Implementing Machine Learning Algorithms“

## Theory is Not Just For the Mathematicians

Machine learning is not just for the mathematical elite. You can learn how machine learning algorithms work and how to get the most from them without diving deep into multivariate statistics.

**You do not need to be good at math.**

As we saw in the techniques section, you can start with algorithm lists and transition deeper into algorithm research, descriptions and algorithm behavior.

You can go very far with these methods without diving much at all into the math.

**You do not need to be an academic researcher.**

Research is not just for academics. Anyone can read books and papers and compile their own understanding of a topic like a specific machine learning algorithm.

Your biggest breakthroughs will come when you take on the persona of “*the scientist*” and start experimenting on machine learning algorithms as though they were complex systems in need of study. You will discover all kinds of interesting quirks in behavior that may not even be documented.

## Take Action

Pick one of the techniques listed above and get started.

I mean today, now.

**Unsure where to start?**

Here’s 5 great ideas of where you could start:

- Make a list of 10 machine algorithms for classification (take a look at my tour of algorithms to get some ideas).
- Find five books that give detailed descriptions of Random Forests.
- Create a five-slide presentation on Naive Bayes using your own algorithm description template.
- Open Weka and see how the “k” parameter affects accuracy of k-nearest neighbor on the iris flowers data set.
- Implement linear regression using stochastic gradient descent.

Did you take action? Enjoy this post? Leave a comment below.

## Frustrated With Machine Learning Math?

#### See How Algorithms Work in Minutes

...with just arithmetic and simple examples

Discover how in my new Ebook: Master Machine Learning Algorithms

It covers **explanations** and **examples** of **10 top algorithms**, including:*Linear Regression*, *k-Nearest Neighbors*, *Support Vector Machines* and much more...

#### Finally, Pull Back the Curtain on

Machine Learning Algorithms

Skip the Academics. Just Results.

Fantastic Post. Enjoyed it.

Thanks Tamim.

Great post.

I would, however, object that theory is important if you want to be able to interpret results and have a clear understanding of the assumptions made and their implications. Hence a solid background in statistics and probabilities is always necessary.

Just relying on algorithms as black boxes to solve certain kinds of problems without any understanding of their principles, their strenghts and weaknesses is somewhat dangerous.

A great book for starters : Data Science from Scratch (O’Reilly).

Greetings,

Alexis

Good point and good book recommendation Alexis.

Jason:

Spot on for what I need to know, thanks!

Alexis:

From the perspective of a solutions oriented practitioner, I would need many lifetimes to understand the full theory of each black box I use.

I instead have to be satisfied with a more functional approach, and learn just enough about the theory and behavior of each black box to use it correctly. Detailed theory comes after that, time and interest permitting. Usually, it’s time.

Cheers,

Bob

I could not agree more bob, this is exactly the top-down approach that I teach.

Hi Jason,

When I hear about papers that describe algorithms that broke records (say, this one that won first place in the ILSVRC competition in 2015: http://arxiv.org/pdf/1512.03385v1.pdf), I get excited and try to read it.

Then I remember that I can’t really read the math notation. Same story for, say, this Stanford ML tutorial on softmax regression: http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/

No clue what some of these symbols mean. Looks maybe like set theory notation sometimes?

Anyways, my real question is: how do I go about figuring the notation out? Is there a reference I could use to learn more about the symbol and when/why it’s used?

Hi Jay.

Hmmm. I agree mate, it’s hard.

So, with softmax, you could go and find one of the many implementations in python or whatever, rip it out and study it in isolation. You could also dump it into a spreadsheet and plot the functions mapping of inputs to outputs.

The general point of the post is to look for alternative descriptions of algorithms that are easier to grok and tinker with them to understand them better.

Back to the papers, especially the latter one. It takes time. You can step through each symbol, they are all defined – and yes you are right there are some general set notation for numbers in the real number set, etc. Perhaps start with a smaller example and practice. If you must grok notation then you must practice working through a lot of notation. The softmax regression does look pretty straight forward but does require a little linalg to get the matrix notation. Not a undergrad course worth, but perhaps a few videos on khan academy.

Great post. This is really built up my confident as software engineering background.

I was taught the math and theory piecemeal and immediately before putting it into practice.

That’s how Andrew Ng teaches and it’s also how it was covered in my post-graduate computer science study.

Any time I have attempted to rush ahead and use something like WEKA before properly understanding the algortihms, I have found myself wasting a lot of time trying to comprehend the results and tune the parameters.

Advanced knowledge of the math might be necessary in a university course — or you risk getting lost in the lectures — but that concern doesn’t exist for online lectures that you can pause or re-watch.

In my view, the best way for developers to learn machine learning is to start with online courses. The moment a concept appears that you don’t understand, simply pause it and find a Kahn Academy video to fill in the blanks.

Great post! Thanks a lot.

You’re welcome Rajesh.

Jason thank you for such nice post.

Right now i am in the same phase of thinking and got the light by your article.

I’m glad to hear it Ganesh.

I agree that the teaching in these subjects is shit. I have a background in Engineering Physics and have absolutely no problem with graduate statistics, but still the theory taught in Machine Learning (I did courses in Neural Networks and Artificial Intelligence) was impossible to follow during the lectures and a pain in the ass. In the end, I barely passed, even though I often get praise for my math skills from my friends at university.

Absolutely terrible.

Thanks Hassan, it’s great to have you here.

I’m here to help if I can.

Great article 🙂 That’s why we coders start with a hello world program – to get something positive out to spur us on.

I could not agree more Vinod!

I made a podcast episode on the math you need for machine learning, and the resources for learning – http://ocdevel.com/podcasts/machine-learning/8. There I agree entirely with your top-down learning approach. Build first, theory later!

Thanks for sharing Tyler.