Weka makes a large number of classification algorithms available.

The large number of machine learning algorithms available is one of the benefits of using the Weka platform to work through your machine learning problems.

In this post you will discover how to use 5 top machine learning algorithms in Weka.

After reading this post you will know:

- About 5 top machine learning algorithms that you can use on your classification problems.
- How to use 5 top classification algorithms in Weka.
- The key configuration parameters for 5 top classification algorithms.

Let’s get started.

## Classification Algorithm Tour Overview

We are going to take a tour of 5 top classification algorithms in Weka.

Each algorithm that we cover will be briefly described in terms of how it works, key algorithm parameters will be highlighted and the algorithm will be demonstrated in the Weka Explorer interface.

The 5 algorithms that we will review are:

- Logistic Regression
- Naive Bayes
- Decision Tree
- k-Nearest Neighbors
- Support Vector Machines

These are 5 algorithms that you can try on your classification problem as a starting point.

A standard machine learning classification problem will be used to demonstrate each algorithm. Specifically, the Ionosphere binary classification problem. This is a good dataset to demonstrate classification algorithms because the input variables are numeric and all have the same scale the problem only has two classes to discriminate.

Each instance describes the properties of radar returns from the atmosphere and the task is to predict whether or not there is structure in the ionosphere or not. There are 34 numerical input variables of generally the same scale. You can learn more about this dataset on the UCI Machine Learning Repository. Top results are in the order of 98% accuracy.

Start the Weka Explorer:

- Open the Weka GUI Chooser.
- Click the “Explorer” button to open the Weka Explorer.
- Load the Ionosphere dataset from the
*data/ionosphere.arff*file. - Click “Classify” to open the Classify tab.

## Start and Practice Machine Learning With Weka

*...without writing a single line of code*

Weka is the best platform for beginners getting started in applied machine learning.

The graphical user interface and state-of-the-art algorithms makes machine learning fun!

#### Take a FREE 14-Day Mini Course in

Applied Machine Learning With Weka

Download your PDF containing all 14 lessons.

Get your daily lesson via email with tips and tricks.

## Logistic Regression

Logistic regression is a binary classification algorithm.

It assumes the input variables are numeric and have a Gaussian (bell curve) distribution. This last point does not have to be true, as logistic regression can still achieve good results if your data is not Gaussian. In the case of the Ionosphere dataset, some input attributes have a Gaussian-like distribution, but many do not.

The algorithm learns a coefficient for each input value, which are linearly combined into a regression function and transformed using a logistic (s-shaped) function. Logistic regression is a fast and simple technique, but can be very effective on some problems.

The logistic regression only supports binary classification problems, although the Weka implementation has been adapted to support multi-class classification problems.

Choose the logistic regression algorithm:

- Click the “Choose” button and select “Logistic” under the “functions” group.
- Click on the name of the algorithm to review the algorithm configuration.

The algorithm can run for a fixed number of iterations (maxIts), but by default will run until it is estimated that the algorithm has converged.

The implementation uses a ridge estimator which is a type of regularization. This method seeks to simplify the model during training by minimizing the coefficients learned by the model. The ridge parameter defines how much pressure to put on the algorithm to reduce the size of the coefficients. Setting this to 0 will turn off this regularization.

- Click “OK” to close the algorithm configuration.
- Click the “Start” button to run the algorithm on the Ionosphere dataset.

You can see that with the default configuration that logistic regression achieves an accuracy of 88%.

## Naive Bayes

Naive Bayes is a classification algorithm. Traditionally it assumes that the input values are nominal, although it numerical inputs are supported by assuming a distribution.

Naive Bayes uses a simple implementation of Bayes Theorem (hence naive) where the prior probability for each class is calculated from the training data and assumed to be independent of each other (technically called conditionally independent).

This is an unrealistic assumption because we expect the variables to interact and be dependent, although this assumption makes the probabilities fast and easy to calculate. Even under this unrealistic assumption, Naive Bayes has been shown to be a very effective classification algorithm.

Naive Bayes calculates the posterior probability for each class and makes a prediction for the class with the highest probability. As such, it supports both binary classification and multi-class classification problems.

Choose the Naive Bayes algorithm:

- Click the “Choose” button and select “NaiveBayes” under the “bayes” group.
- Click on the name of the algorithm to review the algorithm configuration.

By default a Gaussian distribution is assumed for each numerical attributes.

You can change the algorithm to use a kernel estimator with the useKernelEstimator argument that may better match the actual distribution of the attributes in your dataset. Alternately, you can automatically convert numerical attributes to nominal attributes with the useSupervisedDiscretization parameter.

- Click “OK” to close the algorithm configuration.
- Click the “Start” button to run the algorithm on the Ionosphere dataset.

You can see that with the default configuration that Naive Bayes achieves an accuracy of 82%.

There are a number of other flavors of naive bayes algorithms that you could work with.

## Decision Tree

Decision trees can support classification and regression problems.

Decision trees are more recently referred to as Classification And Regression Trees (CART). They work by creating a tree to evaluate an instance of data, start at the root of the tree and moving town to the leaves (roots) until a prediction can be made. The process of creating a decision tree works by greedily selecting the best split point in order to make predictions and repeating the process until the tree is a fixed depth.

After the tree is constructed, it is pruned in order to improve the model’s ability to generalize to new data.

Choose the decision tree algorithm:

- Click the “Choose” button and select “REPTree” under the “trees” group.
- Click on the name of the algorithm to review the algorithm configuration.

The depth of the tree is defined automatically, but a depth can be specified in the maxDepth attribute.

You can also choose to turn of pruning by setting the noPruning parameter to True, although this may result in worse performance.

The minNum parameter defines the minimum number of instances supported by the tree in a leaf node when constructing the tree from the training data.

- Click “OK” to close the algorithm configuration.
- Click the “Start” button to run the algorithm on the Ionosphere dataset.

You can see that with the default configuration that the decision tree algorithm achieves an accuracy of 89%.

Another more advanced decision tree algorithm that you can use is the C4.5 algorithm, called J48 in Weka.

You can review a visualization of a decision tree prepared on the entire training data set by right clicking on the “Result list” and clicking “Visualize Tree”.

## k-Nearest Neighbors

The k-nearest neighbors algorithm supports both classification and regression. It is also called kNN for short.

It works by storing the entire training dataset and querying it to locate the k most similar training patterns when making a prediction. As such, there is no model other than the raw training dataset and the only computation performed is the querying of the training dataset when a prediction is requested.

It is a simple algorithm, but one that does not assume very much about the problem other than that the distance between data instances is meaningful in making predictions. As such, it often achieves very good performance.

When making predictions on classification problems, KNN will take the mode (most common class) of the k most similar instances in the training dataset.

Choose the k-Nearest Neighbors algorithm:

- Click the “Choose” button and select “IBk” under the “lazy” group.
- Click on the name of the algorithm to review the algorithm configuration.

The size of the neighborhood is controlled by the k parameter.

For example, if k is set to 1, then predictions are made using the single most similar training instance to a given new pattern for which a prediction is requested. Common values for k are 3, 7, 11 and 21, larger for larger dataset sizes. Weka can automatically discover a good value for k using cross validation inside the algorithm by setting the crossValidate parameter to True.

Another important parameter is the distance measure used. This is configured in the nearestNeighbourSearchAlgorithm which controls the way in which the training data is stored and searched.

The default is a LinearNNSearch. Clicking the name of this search algorithm will provide another configuration window where you can choose a distanceFunction parameter. By default, Euclidean distance is used to calculate the distance between instances, which is good for numerical data with the same scale. Manhattan distance is good to use if your attributes differ in measures or type.

It is a good idea to try a suite of different k values and distance measures on your problem and see what works best.

- Click “OK” to close the algorithm configuration.
- Click the “Start” button to run the algorithm on the Ionosphere dataset.

You can see that with the default configuration that the kNN algorithm achieves an accuracy of 86%.

## Support Vector Machines

Support Vector Machines were developed for binary classification problems, although extensions to the technique have been made to support multi-class classification and regression problems. The algorithm is often referred to as SVM for short.

SVM was developed for numerical input variables, although will automatically convert nominal values to numerical values. Input data is also normalized before being used.

SVM work by finding a line that best separates the data into the two groups. This is done using an optimization process that only considers those data instances in the training dataset that are closest to the line that best separates the classes. The instances are called support vectors, hence the name of the technique.

In almost all problems of interest, a line cannot be drawn to neatly separate the classes, therefore a margin is added around the line to relax the constraint, allowing some instances to be misclassified but allowing a better result overall.

Finally, few datasets can be separated with just a straight line. Sometimes a line with curves or even polygonal regions need to be marked out. This is achieved with SVM by projecting the data into a higher dimensional space in order to draw the lines and make predictions. Different kernels can be used to control the projection and the amount of flexibility in separating the classes.

Choose the SVM algorithm:

- Click the “Choose” button and select “SMO” under the “function” group.
- Click on the name of the algorithm to review the algorithm configuration.

SMO refers to the specific efficient optimization algorithm used inside the SVM implementation, which stands for Sequential Minimal Optimization.

The C parameter, called the complexity parameter in Weka controls how flexible the process for drawing the line to separate the classes can be. A value of 0 allows no violations of the margin, whereas the default is 1.

A key parameter in SVM is the type of Kernel to use. The simplest kernel is a Linear kernel that separates data with a straight line or hyperplane. The default in Weka is a Polynomial Kernel that will separate the classes using a curved or wiggly line, the higher the polynomial, the more wiggly (the exponent value).

A popular and powerful kernel is the RBF Kernel or Radial Basis Function Kernel that is capable of learning closed polygons and complex shapes to separate the classes.

It is a good idea to try a suite of different kernels and C (complexity) values on your problem and see what works best.

- Click “OK” to close the algorithm configuration.
- Click the “Start” button to run the algorithm on the Ionosphere dataset.

You can see that with the default configuration that the SVM algorithm achieves an accuracy of 88%.

## Summary

In this post you discovered how to use top classification machine learning algorithms in Weka.

Specifically, you learned:

- 5 top classification algorithms you can try on your own problems.
- The key configuration parameters to tune for each algorithm.
- How to use each algorithm in Weka.

Do you have any questions about classification algorithms in Weka or about this post? Ask your questions in the comments and I will do my best to answer.

## Want Machine Learning Without The Code?

#### Develop Your Own Models and Predictions in Minutes

*...with just a few a few clicks*

Discover how in my new Ebook: Machine Learning Mastery With Weka

It covers **self-study tutorials** and **end-to-end projects** on topics like:*Loading data*, *visualization*, *build models*, *algorithm tuning*, and much more...

#### Finally Bring The Machine Learning To

Your Own Projects

Skip the Academics. Just Results.

hi thanks for the blog. i just want to know if i apply all these algorithms and get result, how can i compare them to see the best one for my data set ?

thanks !

Great question malik.

Model selection often comes down to 2 things:

1. Performance. What is the measure that best captures what you want from a model, find the model or models that achieve the best scores on that measure.

2. Complexity. We want the simplest model that performs well. This is good for things like maintainability, explainability, updatability, etc.

You want to bake-off all of the algorithms and all of the data representations you can possibly think of, and pick the “best” using these two criteria. At least as a starting point.

Does that help?

very helpful!!!

I’m glad to hear that.

hi. 🙂

i would like to ask what is this “0.02” which split the tree? i mean it’s not the median, nor arithmetic avarage.

Thanks 🙂

Generally, specific values are used as split points in decision trees.

Can I have a question, on what basis are this “specific values” selected?

Great question.

You can learn more about how CART works here:

http://machinelearningmastery.com/classification-and-regression-trees-for-machine-learning/

Very helpful, thank you…..

I’m glad you found it useful Ann!