Hello World of Applied Machine Learning

By Jason Brownlee on September 5, 2016 in Start Machine Learning 0

It is easy to feel overwhelmed with the large numbers of machine learning algorithms. There are so many to choose from, it is hard to know where to start and what to try.

The choice can be paralyzing.

You need to get over this fear and start.

There is no magic book or course that is going to tell you what algorithm to use and when. In fact, in practice you cannot know this before hand. You have to discover it empirically through trial and error. That means making some mistakes.

In this post you will discover a simple trick that will get you started in applied machine learning.

It will provoke questions in you that will motivate you to dive deeper, it will give you a defense against picking a favorite algorithm or tool and it will accelerate your machine learning journey.

It’s a simple tactic and even experienced practitioners ignore or forget it because of its simplicity.

Hello World of Machine Learning
Photo by Faris Algosaibi, some rights reserved

Familiarize Yourself With Machine Learning Algorithms

You need to build up confidence with a variety of different algorithms.

Much of your skill in addressing machine learning problems will be in the tools that you have available to you and your ability to use them confidently.

The first step on this road is to open a tool or a library and start applying algorithms. Like learning programming, you need to start with hello world. The hello world in applied machine learning is loading a dataset and running an algorithm.

Just running algorithms is how you build that confidence.

You should run lots of algorithms. Run all the algorithms provided by a given tool or library. Then try another library or tool.

You do not want to select favorites. There is no on best machine learning algorithm, and if you use one algorithm or one class of algorithms on all problems, you will be severely limiting the results that you can achieve.

Run Your First Algorithm

You would be surprised at the number of people that are interested in applied machine learning but have not run an algorithm on a problem.

You will also be surprised how trivial it is after you have done it and how much more you have ahead of you.

The procedure is simple:

Select a tool. If you are not a programmer, I recommend Weka because it provides a graphical user interface. If you are a programmer, I recommend scikit-learn in Python or R.
Select a standard dataset. I recommend a standard dataset from the UCI machine learning repository, the iris classification problem is a classic hello world classification problem.
Find or devise a recipe. Determine how you are going to use the tool to load the dataset, spit it into train and test datasets and run one algorithm to make predictions on that dataset. If you are using Weka, you can follow this recipe. I also have recipes in R and scikit-learn that you can use.
Run the recipe.
Review the results. Consider the accuracy you got, what does that mean. Was there information about the algorithm, consider what might that mean.
Repeat. Try a different algorithm, a different algorithm configuration or a different dataset. Run lots of algorithms.

It is easy. So easy in fact that you should do this procedure and get comfortable with working simple problems in all the great machine learning tools and libraries out there.

Another problem in addition to selecting a favorite algorithms is selecting a favorite tool or library.

To truly be effective, you need to work problems and use any and all tools that give you better results. Learn how to use each tool well, but be prepared to jump tools a the drop of a hat.

Building your Motivation with Curiosity

You do not need to understand the problem the tool or the algorithms. Not yet. You are building confidence and familiarity with the tool and what it provides.

You should start having questions like:

How does this algorithm work?
Why is this algorithm giving better results than that algorithm?
What do all of these algorithm parameters mean?

There is a lot to learn in applied machine learning, that is why its an exciting and thrilling field.

Hopefully this exercise will motivate you to dive just a little bit deeper and start looking into a given algorithm or algorithm parameters to answer some of those questions.

More than overcoming the paralysis of choice and building confidence and familiarity with the tool, hopefully this exercise peaks your curiosity. That need to know more can take you a long way and help you push through material that you previously thought was impenetrable.

The beauty of knowing you can now run a given algorithm on a demonstration dataset any time you want, is that you can flip from books, blogs and other materials back to your tool and try out what you read and learn. This ability to put ideas into action will expand your motivation and accelerate your learning.

Going Further

There are a lot of tools out there, almost as many as there are machine learning algorithms.

I like to try most of them, just to see what they offer and what they can do.

A trick that you can use is to create your own little recipes or execution plans in a text file, word document or program code. This allows you to get started fast on a problem or algorithm if you come back to the tool at later date.

Your curiosity will take you further and you may want to start building up a list of machine learning algorithms, describing algorithms and even investigating them in mini research projects.

Action Steps

In this post you discovered a simple trick that you can use to overcome algorithm overwhelm. The trick is to jump in and start applying algorithms to small in-memory problems using off-the-shelf tools and libraries.

The beauty of this trick is that it familiarizes yourself with algorithms and tools, but more importantly tickles your curiosity about the algorithm, its behavior and its parameters. This curiosity can motivate you to dive deeper in the pursuit to know more.

This new found familiarity will also give you a foundation to try out and put into action ideas as you encounter them in your machine learning journey, which can accelerate your learning.

Pick a tool and run your first algorithm.

If you are still stuck, follow this step-by-step tutorial to run your first machine learning algorithm in Weka. Then run a whole lot more algorithms.

Share your experience. Which tool did you choose, which algorithm did you run, what questions did it provoke?

Navigation