# How To Use Classification Machine Learning Algorithms in Weka

Last Updated on

Weka makes a large number of classification algorithms available.

The large number of machine learning algorithms available is one of the benefits of using the Weka platform to work through your machine learning problems.

In this post you will discover how to use 5 top machine learning algorithms in Weka.

After reading this post you will know:

• About 5 top machine learning algorithms that you can use on your classification problems.
• How to use 5 top classification algorithms in Weka.
• The key configuration parameters for 5 top classification algorithms.

Discover how to prepare data, fit models, and evaluate their predictions, all without writing a line of code in my new book, with 18 step-by-step tutorials and 3 projects with Weka.

Let’s get started. How To Use Classification Machine Learning Algorithms in Weka
Photo by Don Graham, some rights reserved.

## Classification Algorithm Tour Overview

We are going to take a tour of 5 top classification algorithms in Weka.

Each algorithm that we cover will be briefly described in terms of how it works, key algorithm parameters will be highlighted and the algorithm will be demonstrated in the Weka Explorer interface.

The 5 algorithms that we will review are:

1. Logistic Regression
2. Naive Bayes
3. Decision Tree
4. k-Nearest Neighbors
5. Support Vector Machines

These are 5 algorithms that you can try on your classification problem as a starting point.

A standard machine learning classification problem will be used to demonstrate each algorithm. Specifically, the Ionosphere binary classification problem. This is a good dataset to demonstrate classification algorithms because the input variables are numeric and all have the same scale the problem only has two classes to discriminate.

Each instance describes the properties of radar returns from the atmosphere and the task is to predict whether or not there is structure in the ionosphere or not. There are 34 numerical input variables of generally the same scale. You can learn more about this dataset on the UCI Machine Learning Repository. Top results are in the order of 98% accuracy.

Start the Weka Explorer:

1. Open the Weka GUI Chooser.
2. Click the “Explorer” button to open the Weka Explorer.
3. Load the Ionosphere dataset from the data/ionosphere.arff file.
4. Click “Classify” to open the Classify tab.

### Need more help with Weka for Machine Learning?

Take my free 14-day email course and discover how to use the platform step-by-step.

Click to sign-up and also get a free PDF Ebook version of the course.

## Logistic Regression

Logistic regression is a binary classification algorithm.

It assumes the input variables are numeric and have a Gaussian (bell curve) distribution. This last point does not have to be true, as logistic regression can still achieve good results if your data is not Gaussian. In the case of the Ionosphere dataset, some input attributes have a Gaussian-like distribution, but many do not.

The algorithm learns a coefficient for each input value, which are linearly combined into a regression function and transformed using a logistic (s-shaped) function. Logistic regression is a fast and simple technique, but can be very effective on some problems.

The logistic regression only supports binary classification problems, although the Weka implementation has been adapted to support multi-class classification problems.

Choose the logistic regression algorithm:

1. Click the “Choose” button and select “Logistic” under the “functions” group.
2. Click on the name of the algorithm to review the algorithm configuration. Weka Configuration for the Logistic Regression Algorithm

The algorithm can run for a fixed number of iterations (maxIts), but by default will run until it is estimated that the algorithm has converged.

The implementation uses a ridge estimator which is a type of regularization. This method seeks to simplify the model during training by minimizing the coefficients learned by the model. The ridge parameter defines how much pressure to put on the algorithm to reduce the size of the coefficients. Setting this to 0 will turn off this regularization.

1. Click “OK” to close the algorithm configuration.
2. Click the “Start” button to run the algorithm on the Ionosphere dataset.

You can see that with the default configuration that logistic regression achieves an accuracy of 88%. Weka Classification Results for the Logistic Regression Algorithm

## Naive Bayes

Naive Bayes is a classification algorithm. Traditionally it assumes that the input values are nominal, although it numerical inputs are supported by assuming a distribution.

Naive Bayes uses a simple implementation of Bayes Theorem (hence naive) where the prior probability for each class is calculated from the training data and assumed to be independent of each other (technically called conditionally independent).

This is an unrealistic assumption because we expect the variables to interact and be dependent, although this assumption makes the probabilities fast and easy to calculate. Even under this unrealistic assumption, Naive Bayes has been shown to be a very effective classification algorithm.

Naive Bayes calculates the posterior probability for each class and makes a prediction for the class with the highest probability. As such, it supports both binary classification and multi-class classification problems.

Choose the Naive Bayes algorithm:

1. Click the “Choose” button and select “NaiveBayes” under the “bayes” group.
2. Click on the name of the algorithm to review the algorithm configuration. Weka Configuration for the Naive Bayes Algorithm

By default a Gaussian distribution is assumed for each numerical attributes.

You can change the algorithm to use a kernel estimator with the useKernelEstimator argument that may better match the actual distribution of the attributes in your dataset. Alternately, you can automatically convert numerical attributes to nominal attributes with the useSupervisedDiscretization parameter.

1. Click “OK” to close the algorithm configuration.
2. Click the “Start” button to run the algorithm on the Ionosphere dataset.

You can see that with the default configuration that Naive Bayes achieves an accuracy of 82%. Weka Classification Results for the Naive Bayes Algorithm

There are a number of other flavors of naive bayes algorithms that you could work with.

## Decision Tree

Decision trees can support classification and regression problems.

Decision trees are more recently referred to as Classification And Regression Trees (CART). They work by creating a tree to evaluate an instance of data, start at the root of the tree and moving town to the leaves (roots) until a prediction can be made. The process of creating a decision tree works by greedily selecting the best split point in order to make predictions and repeating the process until the tree is a fixed depth.

After the tree is constructed, it is pruned in order to improve the model’s ability to generalize to new data.

Choose the decision tree algorithm:

1. Click the “Choose” button and select “REPTree” under the “trees” group.
2. Click on the name of the algorithm to review the algorithm configuration. Weka Configuration for the Decision Tree Algorithm

The depth of the tree is defined automatically, but a depth can be specified in the maxDepth attribute.

You can also choose to turn of pruning by setting the noPruning parameter to True, although this may result in worse performance.

The minNum parameter defines the minimum number of instances supported by the tree in a leaf node when constructing the tree from the training data.

1. Click “OK” to close the algorithm configuration.
2. Click the “Start” button to run the algorithm on the Ionosphere dataset.

You can see that with the default configuration that the decision tree algorithm achieves an accuracy of 89%. Weka Classification Results for the Decision Tree Algorithm

Another more advanced decision tree algorithm that you can use is the C4.5 algorithm, called J48 in Weka.

You can review a visualization of a decision tree prepared on the entire training data set by right clicking on the “Result list” and clicking “Visualize Tree”. Weka Visualization of a Decision Tree

## k-Nearest Neighbors

The k-nearest neighbors algorithm supports both classification and regression. It is also called kNN for short.

It works by storing the entire training dataset and querying it to locate the k most similar training patterns when making a prediction. As such, there is no model other than the raw training dataset and the only computation performed is the querying of the training dataset when a prediction is requested.

It is a simple algorithm, but one that does not assume very much about the problem other than that the distance between data instances is meaningful in making predictions. As such, it often achieves very good performance.

When making predictions on classification problems, KNN will take the mode (most common class) of the k most similar instances in the training dataset.

Choose the k-Nearest Neighbors algorithm:

1. Click the “Choose” button and select “IBk” under the “lazy” group.
2. Click on the name of the algorithm to review the algorithm configuration. Weka Configuration for the k-Nearest Neighbors Algorithm

The size of the neighborhood is controlled by the k parameter.

For example, if k is set to 1, then predictions are made using the single most similar training instance to a given new pattern for which a prediction is requested. Common values for k are 3, 7, 11 and 21, larger for larger dataset sizes. Weka can automatically discover a good value for k using cross validation inside the algorithm by setting the crossValidate parameter to True.

Another important parameter is the distance measure used. This is configured in the nearestNeighbourSearchAlgorithm which controls the way in which the training data is stored and searched.

The default is a LinearNNSearch. Clicking the name of this search algorithm will provide another configuration window where you can choose a distanceFunction parameter. By default, Euclidean distance is used to calculate the distance between instances, which is good for numerical data with the same scale. Manhattan distance is good to use if your attributes differ in measures or type. Weka Configuration for the Search Algorithm in the k-Nearest Neighbors Algorithm

It is a good idea to try a suite of different k values and distance measures on your problem and see what works best.

1. Click “OK” to close the algorithm configuration.
2. Click the “Start” button to run the algorithm on the Ionosphere dataset.

You can see that with the default configuration that the kNN algorithm achieves an accuracy of 86%. Weka Classification Results for k-Nearest Neighbors

## Support Vector Machines

Support Vector Machines were developed for binary classification problems, although extensions to the technique have been made to support multi-class classification and regression problems. The algorithm is often referred to as SVM for short.

SVM was developed for numerical input variables, although will automatically convert nominal values to numerical values. Input data is also normalized before being used.

SVM work by finding a line that best separates the data into the two groups. This is done using an optimization process that only considers those data instances in the training dataset that are closest to the line that best separates the classes. The instances are called support vectors, hence the name of the technique.

In almost all problems of interest, a line cannot be drawn to neatly separate the classes, therefore a margin is added around the line to relax the constraint, allowing some instances to be misclassified but allowing a better result overall.

Finally, few datasets can be separated with just a straight line. Sometimes a line with curves or even polygonal regions need to be marked out. This is achieved with SVM by projecting the data into a higher dimensional space in order to draw the lines and make predictions. Different kernels can be used to control the projection and the amount of flexibility in separating the classes.

Choose the SVM algorithm:

1. Click the “Choose” button and select “SMO” under the “function” group.
2. Click on the name of the algorithm to review the algorithm configuration.

SMO refers to the specific efficient optimization algorithm used inside the SVM implementation, which stands for Sequential Minimal Optimization. Weka Configuration for the Support Vector Machines Algorithm

The C parameter, called the complexity parameter in Weka controls how flexible the process for drawing the line to separate the classes can be. A value of 0 allows no violations of the margin, whereas the default is 1.

A key parameter in SVM is the type of Kernel to use. The simplest kernel is a Linear kernel that separates data with a straight line or hyperplane. The default in Weka is a Polynomial Kernel that will separate the classes using a curved or wiggly line, the higher the polynomial, the more wiggly (the exponent value).

A popular and powerful kernel is the RBF Kernel or Radial Basis Function Kernel that is capable of learning closed polygons and complex shapes to separate the classes.

It is a good idea to try a suite of different kernels and C (complexity) values on your problem and see what works best.

1. Click “OK” to close the algorithm configuration.
2. Click the “Start” button to run the algorithm on the Ionosphere dataset.

You can see that with the default configuration that the SVM algorithm achieves an accuracy of 88%. Weka Classification Results for the Support Vector Machine Algorithm

## Summary

In this post you discovered how to use top classification machine learning algorithms in Weka.

Specifically, you learned:

• 5 top classification algorithms you can try on your own problems.
• The key configuration parameters to tune for each algorithm.
• How to use each algorithm in Weka.

## Discover Machine Learning Without The Code! #### Develop Your Own Models in Minutes

...with just a few a few clicks

Discover how in my new Ebook:
Machine Learning Mastery With Weka

Covers self-study tutorials and end-to-end projects like:

### 55 Responses to How To Use Classification Machine Learning Algorithms in Weka

1. malik December 16, 2016 at 6:17 am #

hi thanks for the blog. i just want to know if i apply all these algorithms and get result, how can i compare them to see the best one for my data set ?
thanks !

• Jason Brownlee December 17, 2016 at 11:05 am #

Great question malik.

Model selection often comes down to 2 things:

1. Performance. What is the measure that best captures what you want from a model, find the model or models that achieve the best scores on that measure.
2. Complexity. We want the simplest model that performs well. This is good for things like maintainability, explainability, updatability, etc.

You want to bake-off all of the algorithms and all of the data representations you can possibly think of, and pick the “best” using these two criteria. At least as a starting point.

Does that help?

2. Hamelmal kiros February 7, 2017 at 10:10 pm #

• Jason Brownlee February 8, 2017 at 9:35 am #

3. Alex May 17, 2017 at 7:09 am #

hi. 🙂
i would like to ask what is this “0.02” which split the tree? i mean it’s not the median, nor arithmetic avarage.
Thanks 🙂

• Jason Brownlee May 17, 2017 at 8:43 am #

Generally, specific values are used as split points in decision trees.

• Alex May 17, 2017 at 9:14 am #

Can I have a question, on what basis are this “specific values” selected?

4. Ann June 6, 2017 at 3:42 pm #

• Jason Brownlee June 7, 2017 at 7:09 am #

I’m glad you found it useful Ann!

5. Tejasree July 12, 2017 at 12:52 am #

Very helpful. I have question how do we know that we have to select only these 5 algorithms?Why did you choose only these 5 algorithms out of many algorithms ?

• Jason Brownlee July 12, 2017 at 9:47 am #

Choose any algorithms you wish.

6. Noor August 8, 2017 at 1:10 am #

Very helpful. I have question: how can we apply Naive Bayes and SVM on mixed data set such as Credit Approval Data Set from UCI ,because both of them can not handle numerical and nominal data set directly.

• Jason Brownlee August 8, 2017 at 7:51 am #

Both naive bayes and SVM can handle real valued inputs, you could transform the data into this standard form.

7. Dhamayanthi N. August 28, 2017 at 6:05 am #

hi Jason

how do i add a new classifier algorithm in Weka?

• Jason Brownlee August 28, 2017 at 6:52 am #

Sorry, I do not have an example of this.

8. Zoya September 27, 2017 at 9:57 pm #

Hi. This was very informative. I have a question here. Just like we visualize the tree structure in J48 algorithm, or any tree based algorithm, to visualize the model, how can we visualize models like logistic regression, k-nearest neighbors, etc. – basically, a non – tree based algorithm? I wish to visualize the pattern for training and testing sets.

• Jason Brownlee September 28, 2017 at 5:26 am #

Generally, we don’t. Instead, we use proxies for visualization such as skill scores like accuracy or RMSE.

• Zoya September 28, 2017 at 2:10 pm #

Thanks a lot

9. azahari October 11, 2017 at 11:07 am #

Hi. How to use own algorithm in svm using Weka? Help me. Thank you in advance.

• Jason Brownlee October 11, 2017 at 4:39 pm #

Sorry, I don’t have an example of implementing a custom algorithm in Weka.

10. celestin November 28, 2017 at 4:21 pm #

• Jason Brownlee November 29, 2017 at 8:19 am #

What is the problem exactly?

11. soeysoo April 30, 2018 at 3:29 am #

I had a question. May i know for the below result, which is the best model? Which parameter did i need to consider?

J48 66.0651%(correctly classifier) 0.4584(RMSE) 0.3217 (kappa) 0.707(ROC) 0.12(time taken)

SMO 66.0967%(correctly classifier) 0.5823(RMSE) 0.3222 (kappa) 0.661(ROC) 100.08(time taken)

• Jason Brownlee April 30, 2018 at 5:39 am #

Model selection involves not only skill, but also the broader concerns of the problem.

The difference between model skills may or may not be significance. Statistical significance tests may be used to help tease out whether the difference in skill is real.

For example, we often prefer simpler models over more complex models.

Stakeholders may also prefer models with a specific form.

• nzeid September 20, 2018 at 7:12 am #

Hi Jason,

Thanks for this useful article , i have a question in the confusion matrix which is, how can i get in weka how the algorithm confused make the right classifier, means the classifiers can predict that these 500 instances belong to class a and the other 50 instances predicted that they belong to class b although the actual it should belong to class a

12. Marco May 3, 2018 at 7:22 pm #

Hello Jason. Thanks for this. i have a question. How I can compare the classifications (e.g. SVM vs MLP). I though to understand their agreement using e.g. the Adjusted Rand Index. But I do not understand how to extract the classification of each case (instance) in the various classification processes. Can you help me with that? Thanks in adance.

13. KK May 11, 2018 at 12:25 am #

Hi Jason, Your blog is really useful and the way of explaining the topics is easy to understand, Could you please also write a article on AIC and BIC.
1. On which algorithms it can be used ?
2. Whether it is compared across different algorithms or different models of same algorithms ?
3. Which one to consider AIC/BIC or RMSE/Accuracy ?
Looking forward for the post.

• Jason Brownlee May 11, 2018 at 6:38 am #

Thanks for the suggestions!

14. Izzat May 15, 2018 at 3:29 pm #

Hello Jason!

I have a question on how to used the four kernels in SVM. I didn’t found option for Linear and Sigmoid kernels. How to configure the setting for different kernel?

• Jason Brownlee May 16, 2018 at 5:57 am #

In the configuration for SVM you can choose the kernel, then click on the kernel to change its configuration.

15. deinasore July 16, 2018 at 10:25 pm #

hello, what classifier can you suggest that works best with 2 attributes having real data (numeric)?

16. Da orestes October 21, 2018 at 11:54 am #

Hey Hello,

Thanks for the basic tutorial.

I did this on a dataset and I found the accuracy and percentage of Best classifier are same for the SVM and decision technique as well as same confusion matrix for Decision Tree and SVM. So, in this situation, how you choose the best classifier ?

• Jason Brownlee October 22, 2018 at 6:17 am #

Good question. I recommend choosing the simplest/least complex method.

Perhaps the decision tree is less complex than SVM.

17. Abo Emad December 15, 2018 at 3:02 am #

How can we use weka classify sensitive data?

• Jason Brownlee December 15, 2018 at 6:14 am #

What is “sensitive data”?

18. Opridai January 14, 2019 at 7:23 pm #

Dear Sir, well work. my Question is that how we can say that these are top algorithms?
secondly, which one show best result out of the above 5 ?

• Jason Brownlee January 15, 2019 at 5:50 am #

I recommend testing a suite of algorithms on your problem and discover what works best on your specific dataset.

19. George January 30, 2019 at 12:39 am #

Jason – What is the best way to create a graphs of the individual algorithms over time in Weka? Like a learning curve.

• Jason Brownlee January 30, 2019 at 8:12 am #

Sorry, I don’t have an example of learning curves in Weka.

20. Eva August 14, 2019 at 2:47 am #

Thanks for the tutorial.
For the decision tree example, can you explain the values that are printed out on the leaves?
Thanks

21. mehrnaz October 7, 2019 at 1:38 am #

Hello. Thanks to the training, it was very informative. How can we compare all these algorithms together? What part do we need to zoom in to compare answers? And how do I save results?

22. mehrnaz October 7, 2019 at 3:17 am #

In fact, I saw your posts that were about uploading and saving, but when we save it when we want to reload, the shape and input changes and doesn’t show the percentage.

• Jason Brownlee October 7, 2019 at 8:31 am #

Sorry, I don’t follow, can you elaborate please?

23. mehrnaz October 8, 2019 at 1:28 am #

How do I copy the output of the algorithm into Weka software, so how do I save the algorithm and find the percentage of each algorithm in the output?

• Jason Brownlee October 8, 2019 at 8:07 am #

Sorry, I don’t follow, can you please elaborate?

24. mehrnaz October 9, 2019 at 12:36 am #

How do I store the outputs of the algorithm? I mean, the classifier output part, how do I store the answer that shows?

• Jason Brownlee October 9, 2019 at 8:15 am #

Perhaps select all text, copy to clipboard, paste in a text editor?

25. mehrnaz October 29, 2019 at 5:33 pm #

I used this 5 algorithms in weka and get results. but in my result i want to comparison all the algorithms and see which algorithms work correctly and better in all of my result. what i do?