Machine learning algorithms are complex systems that require study to understand.
Static descriptions of machine learning algorithms are a good starting point, but are insufficient to get a feeling for how the algorithm behaves. You need to see the algorithm in action.
Experimenting on a running machine learning algorithms will allow you to build an intuition for the cause and effect relationship of the algorithm parameters with the results you can achieve on different classes of problem.
In this post you will discover how to investigate a machine learning algorithm. You will learn about a simple 5-step process that you can use today to design and complete your first machine learning algorithm experiment.
You will discover that machine learning experiments are not just for academics, that you can do them to, and that experimentation is required on the path to mastery as the empirical cause-and-effect knowledge that you will gain is simply not available anywhere else.
What is Investigating Machine Learning Algorithms
Your objective when investigating a machine learning algorithm is to find behaviors that lead to good results that are generalizable across problems and classes of problems.
You investigate machine learning algorithms by performing systematic research into the algorithms behavior. This is done by designing and executing controlled experiments.
Once you have completed an experiment, you can interpret and present the results. The results give you glimpses into the cause and effect between changes to the algorithm, it’s behaviors and the results you can achieve.
Get your FREE Algorithms Mind Map
I've created a handy mind map of 60+ algorithms organized by type.
Download it, print it and use it.
Also get exclusive access to the machine learning algorithms email mini-course.
How to Investigate Machine Learning Algorithms
In this section we will look at a simple 5-step procedure that you can use to investigate a machine learning algorithm.
1. Select an Algorithm
Select an algorithm that you have questions about.
This may be an algorithm that you are using in ernest on a problem, or an algorithm that you see doing well in other contexts that you may want to use in the future.
For the purposes of experimentation, it is useful to take an off-the-shelf implementation of the algorithm. This gives you a baseline that most likely has few if any bugs.
Implementing the algorithm yourself can be a great way to learn about the algorithm procedure, but can also introduce additional variables into the experiment such as bugs and the myriad of micro-decisions that must be made for each algorithm implementation.
2. Identify a Question
You must have a research question that you are seeking to answer. The more specific the question, the more useful the answer.
Some example questions include:
- What is the effect of increasing k in kNN as a fraction of the training dataset size?
- What is the effect of selecting different kernels in SVM on binary classification problems?
- What are the effects of different attribute scaling on logistic regression on binary classification problems?
- What is the effect of adding random attributes to the training dataset on classification accuracy in random forest?
Design the question that you want answered about your algorithm. Consider listing five variations of the question and hone in on the one that is the most specific.
3. Design the Experiment
Pick the elements out of the question that will make-up your experiment.
For example, take the following question from above: “What are the effects of different attribute scaling on logistic regression on binary classification problems?”
The elements you can pick out of this question for the design of your experiment are:
- Attribute Scaling Methods. You could include methods like normalization, standardization, raising an attribute to a power, taking the logarithm, etc.
- Logistic Regression. Which implementation of logistic regression you want to use.
- Binary Classification Problems. Different standard binary classification problems that have numeric attributes. Multiple problems will be required, some with attributes all the same scale (like ionosphere) and others that have attributes with a variety of scales (like diabetes).
- Performance. A model performance score is required such as classification accuracy.
Take the time to carefully select the elements of your question to best answer your question.
4. Execute the Experiment and Report Results
Complete your experiment.
If the algorithm is stochastic, you may need to repeat experimental runs multiple times and take a mean and standard deviation.
If you are looking for differences in results between experimental runs (such as different parameters), you may want to use a statistical tool to indicate whether the differences are statistically significant (such as the student t-test).
Some tools like R and scikit-learn/SciPy have the tools available to complete these types of experiments, but you will need to bring them together and script the experiment. Other tools like Weka have the tools built into a graphical user interface (see this tutorial on running your first experiment in Weka). The tools you use matter less than the question and the rigor of your experimental design.
Summarize the results of your experiment. You may want to use tables and graphs. Presenting results alone is insufficient. They are just numbers. You must tie the numbers back to your question and filter their meaning through the design of your experiment.
What do the results indicate about your research question?
Put on your skeptical hat. What holes or limitations can you place on the results. Do not shy away from this part. Knowing the limitations is just as important as knowing the outcomes of an experiment.
Repeat the process.
Continue to investigate your selected algorithm. You may even want to repeat the same experiment with different parameters or different test datasets. You may want to address the limitations in your experiment.
Don’t stop with one experiment, start building up a knowledge base and an intuition for the algorithm.
With some simple tools, some good questions and a good splash of rigor and skepticism, you can very quickly start coming up with world-class understandings into the behavior of an algorithm.
Investigating Algorithms is Not Just for Academics
You can investigate the behaviors of machine learning algorithms.
You do not need a higher degree, you do not need to be trained in research methods, you do not need to be an academic.
Careful systematic investigation of machine learning algorithms is open to anyone with a computer and a deep interest. In fact, if you want to master machine learning, you must get comfortable with systematic investigations of machine learning algorithms. The knowledge is simply not out there, you must go out and collect it yourself, empirically.
You do need to be skeptical and to be careful when talking about the applicability of your findings.
You do not need to have unique questions. You will gain a lot by investigating the standard questions, such as the effect of one parameter generalized across a few standard datasets. You may very well find limitations or counter points to common best practice heuristics.
In this post you discovered the importance of investigating the behaviors of machine learning algorithms through controlled experimentation. You discovered a simple 5-step process that you can use you design and execute your first experiment on a machine learning algorithm.
Take action. Use the process you learned in this blog post and complete you first machine learning experiment. Once you have completed one, even a very small one, you will have the confidence, tools, and ability to complete a second and many more.
I would love to hear about your first experiment. Leave a comment and share your results or what you learned.
Frustrated With Machine Learning Math?
See How Algorithms Work in Minutes
…with just arithmetic and simple examples
Discover how in my new Ebook: Master Machine Learning Algorithms
It covers explanations and examples of 10 top algorithms, like:
Linear Regression, k-Nearest Neighbors, Support Vector Machines and much more…
Finally, Pull Back the Curtain on
Machine Learning Algorithms
Skip the Academics. Just Results.