How to Perform Feature Selection With Machine Learning Data in Weka

Raw machine learning data contains a mixture of attributes, some of which are relevant to making predictions.

How do you know which features to use and which to remove? The process of selecting features in your data to model your problem is called feature selection.

In this post you will discover how to perform feature selection with your machine learning data in Weka.

After reading this post you will know:

  • About the importance of feature selection when working through a machine learning problem.
  • How feature selection is supported on the Weka platform.
  • How to use various different feature selection techniques in Weka on your dataset.

Let’s get started.

  • Update March/2018: Added alternate link to download the dataset as the original appears to have been taken down.
How to Perform Feature Selection With Machine Learning Data in Weka

How to Perform Feature Selection With Machine Learning Data in Weka
Photo by Peter Gronemann, some rights reserved.

Predict the Onset of Diabetes

The dataset used for this example is the Pima Indians onset of diabetes dataset.

It is a classification problem where each instance represents medical details for one patient and the task is to predict whether the patient will have an onset of diabetes within the next five years.

You can learn more about this dataset on the UCI Machine Learning Repository page for the Pima Indians dataset. You can download the dataset directly from this page (update: download from here). You can also access this dataset in your Weka installation, under the data/ directory in the file called diabetes.arff.

Need more help with Weka for Machine Learning?

Take my free 14-day email course and discover how to use the platform step-by-step.

Click to sign-up and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

Feature Selection in Weka

Many feature selection techniques are supported in Weka.

A good place to get started exploring feature selection in Weka is in the Weka Explorer.

  1. Open the Weka GUI Chooser.
  2. Click the “Explorer” button to launch the Explorer.
  3. Open the Pima Indians dataset.
  4. Click the “Select attributes” tab to access the feature selection methods.
Weka Feature Selection

Weka Feature Selection

Feature selection is divided into two parts:

  • Attribute Evaluator
  • Search Method.

Each section has multiple techniques from which to choose.

The attribute evaluator is the technique by which each attribute in your dataset (also called a column or feature) is evaluated in the context of the output variable (e.g. the class). The search method is the technique by which to try or navigate different combinations of attributes in the dataset in order to arrive on a short list of chosen features.

Some Attribute Evaluator techniques require the use of specific Search Methods. For example, the CorrelationAttributeEval technique used in the next section can only be used with a Ranker Search Method, that evaluates each attribute and lists the results in a rank order. When selecting different Attribute Evaluators, the interface may ask you to change the Search Method to something compatible with the chosen technique.

Weka Feature Selection Alert

Weka Feature Selection Alert

Both the Attribute Evaluator and Search Method techniques can be configured. Once chosen, click on the name of the technique to get access to its configuration details.

Weka Feature Selection Configuration

Weka Feature Selection Configuration

Click the “More” button to get more documentation on the feature selection technique and configuration parameters. Hover your mouse cursor over a configuration parameter to get a tooltip containing more details.

Weka Feature Selection More Information

Weka Feature Selection More Information

Now that we know how to access feature selection techniques in Weka, let’s take a look at how to use some popular methods on our chosen standard dataset.

Correlation Based Feature Selection

A popular technique for selecting the most relevant attributes in your dataset is to use correlation.

Correlation is more formally referred to as Pearson’s correlation coefficient in statistics.

You can calculate the correlation between each attribute and the output variable and select only those attributes that have a moderate-to-high positive or negative correlation (close to -1 or 1) and drop those attributes with a low correlation (value close to zero).

Weka supports correlation based feature selection with the CorrelationAttributeEval technique that requires use of a Ranker search method.

Running this on our Pima Indians dataset suggests that one attribute (plas) has the highest correlation with the output class. It also suggests a host of attributes with some modest correlation (mass, age, preg). If we use 0.2 as our cut-off for relevant attributes, then the remaining attributes could possibly be removed (pedi, insu, skin and pres).

Weka Correlation-Based Feature Selection Method

Weka Correlation-Based Feature Selection Method

Information Gain Based Feature Selection

Another popular feature selection technique is to calculate the information gain.

You can calculate the information gain (also called entropy) for each attribute for the output variable. Entry values vary from 0 (no information) to 1 (maximum information). Those attributes that contribute more information will have a higher information gain value and can be selected, whereas those that do not add much information will have a lower score and can be removed.

Weka supports feature selection via information gain using the InfoGainAttributeEval Attribute Evaluator. Like the correlation technique above, the Ranker Search Method must be used.

Running this technique on our Pima Indians we can see that one attribute contributes more information than all of the others (plas). If we use an arbitrary cutoff of 0.05, then we would also select the mass, age and insu attributes and drop the rest from our dataset.

Weka Information Gain-Based Feature Selection Method

Weka Information Gain-Based Feature Selection Method

Learner Based Feature Selection

A popular feature selection technique is to use a generic but powerful learning algorithm and evaluate the performance of the algorithm on the dataset with different subsets of attributes selected.

The subset that results in the best performance is taken as the selected subset. The algorithm used to evaluate the subsets does not have to be the algorithm that you intend to use to model your problem, but it should be generally quick to train and powerful, like a decision tree method.

In Weka this type of feature selection is supported by the WrapperSubsetEval technique and must use a GreedyStepwise or BestFirst Search Method. The latter, BestFirst, is preferred if you can spare the compute time.

1. First select the “WrapperSubsetEval” technique.

2. Click on the name “WrapperSubsetEval” to open the configuration for the method.

3. Click the “Choose” button for the “classifier” and change it to J48 under “trees”.

Weka Wrapper Feature Selection Configuration

Weka Wrapper Feature Selection Configuration

4. Click “OK” to accept the configuration.

5. Change the “Search Method” to “BestFirst”.

6. Click the “Start” button to evaluate the features.

Running this feature selection technique on the Pima Indians dataset selects 4 of the 8 input variables: plas, pres, mass and age.

Weka Wrapper Feature Selection Method

Weka Wrapper Feature Selection Method

Select Attributes in Weka

Looking back over the three techniques, we can see some overlap in the selected features (e.g. plas), but also differences.

It is a good idea to evaluate a number of different “views” of your machine learning dataset. A view of your dataset is nothing more than a subset of features selected by a given feature selection technique. It is a copy of your dataset that you can easily make in Weka.

For example, taking the results from the last feature selection technique, let’s say we wanted to create a view of the Pima Indians dataset with only the following attributes: plas, pres, mass and age:

1. Click the “Preprocess” tab.

2. In the “Attributes” selection Tick all but the plas, pres, mass, age and class attributes.

Weka Select Attributes To Remove From Dataset

Weka Select Attributes To Remove From Dataset

3. Click the “Remove” button.

4. Click the “Save” button and enter a filename.

You now have a new view of your dataset to explore.

Weka Attributes Removed From Dataset

Weka Attributes Removed From Dataset

What Feature Selection Techniques To Use

You cannot know which views of your data will produce the most accurate models.

Therefore, it is a good idea to try a number of different feature selection techniques on your data and in turn create many different views of your data.

Select a good generic technique, like a decision tree, and build a model for each view of your data.

Compare the results to get an idea of which view of your data results in the best performance. This will give you an idea of the view or more specifically features that best expose the structure of your problem to learning algorithms in general.

Summary

In this post you discovered the importance of feature selection and how to use feature selection on your data with Weka.

Specifically, you learned:

  • How to perform feature selection using correlation.
  • How to perform feature selection using information gain.
  • How to perform feature selection by training a model on different subsets of features.

Do you have any questions about feature selection in Weka or about this post? Ask your questions in the comments and I will do my best to answer.


Want Machine Learning Without The Code?

Master Machine Learning With Weka

Develop Your Own Models in Minutes

…with just a few a few clicks

Discover how in my new Ebook:
Machine Learning Mastery With Weka

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring The Machine Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.


50 Responses to How to Perform Feature Selection With Machine Learning Data in Weka

  1. Rajesh October 7, 2016 at 4:33 am #

    Sir what is the difference between classifierattribute eval and wrapperattributeeval in weka.

  2. Mark October 13, 2016 at 8:55 pm #

    So, the accuracy I receive without any appliance: J48 and all instances and features of the diabettes.arff data set, is 84.11% and the error is 15.88%

    After applying any of the CorrelationAttributeEval, the InfoGain and the WwrapperSubsetEval, I receive lower accuracy. Of course this is obvious because I end up removing some features, but how is this good exactly? I am loosing information. This is not good, am I wrong?

    • Jason Brownlee October 14, 2016 at 9:02 am #

      Great question Mark.

      We only want to perform feature selection that ultimately benefits the performance of our models.

      I use feature selection as a guide, each method gives different hints about what features might be important. Each set/subset can be used as input to train a new model to be compared to a baseline or ensemble together to compete with the baseline.

      Worse performance after feature selection still teaches you something. Don’t discard those features, or build a model based on this “new” view of the problem and combine it with models trained on other intelligently selected views of the problem.

      I hope gives more insight on a very important topic that you’ve raised.

  3. Shailesh November 4, 2016 at 3:32 am #

    Hi Jason,

    Thanks for this informative article!

    So, you have used an arbitrary cut-off value for correlation and informationGain in order to select a subset of features.
    Is there any method to select a cut-off value?

    I have another doubt regarding the feature selection.

    In order to select the best subset of features from the output of “InformationGain + Ranker’s” method, I removed low-ranked features one by one and checked the accuracy of my classifier for each subset , and chose the subset that gives maximum accuracy.

    However, for some data-set, I got same (maximum) accuracy value for 2 subsets of features.
    For example, I have a set of 21 features, and a subset of 10 features and 6 features give the same maximum accuracy out of all possible subsets.

    So I am confused for which subset to choose?
    Can you help me?

    Thanks!

    • Jason Brownlee November 4, 2016 at 11:14 am #

      Thanks Shailesh.

      I would suggest try creating a model with the features using each value as a cut-off, and let model skill dictate the features to adopt.

      Yes, I like the approach you outline. Fewer features are better (lower complexity, easier to understand). Also, compare these results to a new ensemble model that averages the performance of the models with different numbers of features.

  4. Poornima November 9, 2016 at 5:19 pm #

    If I use say IG for feature selection and then SVM for classification using cross validation…then the feature selection will ably to the entire dataset and not just the training set….which is not correct I guess…

  5. Sadiq January 11, 2017 at 5:03 am #

    Dear Jason Brownlee
    I would like to ask you about how can I perform PSO as feature selection algorithm within weka ? Is there any way to add PSO to weka program?
    thank you in advanced.

    • Jason Brownlee January 11, 2017 at 9:30 am #

      Sorry Sadiq, I have not used PSO for feature selection within Weka. I cannot give you good advice.

  6. Poornima February 19, 2017 at 12:57 am #

    If I use Info Gain to select the attributes of the training dataset and take the output in another .arff file using command line. Now we have the training dataset with selected attributes. Is it possible to create the testing data with these selected attributes only. It is very difficult to remove the attributes manually as my data is of very large dimension. – thnx

    • Jason Brownlee February 19, 2017 at 8:53 am #

      I believe there may be a data filter to apply feature selection and remove unselected features.

      Perhaps take a look through some of the data filters for such a filter.

      You can then save the filtered features to a new file and work with it directly.

  7. Miks May 1, 2017 at 1:22 am #

    Jason, great post.

    I got confusing situation. I tried CorrelationAttributeEval with my own data set and specified outputDetailedInfo:true in evaluator’s configuration window. Weka gave me list of correlations for each individual value for each feature. This is great, but there is a single feature with only two possible values and both have similar correlation. As I understand, this means that this feature can’t influance the prediction in any way, since the correlation is similar with any possible value… even if the total correlation of feature is one of the best compared with other features. Am I right?

  8. Agung June 29, 2017 at 3:08 pm #

    Hi Jason,
    It is a very good explanation. But I wonder, basically what is feature selection? If I sum all the attribute value, why the total is not 1 or 100%?

    • Jason Brownlee June 30, 2017 at 8:07 am #

      Feature selection is a way of cutting down the number of input variables to your model to hopefully get simpler models or better predictions, or both.

  9. lena November 5, 2017 at 7:17 am #

    When performing feature selection, should we perform it in the entire dataset (training and testing) and then split the data? or should we perform it just in training portion?

    • Jason Brownlee November 6, 2017 at 4:46 am #

      It is a good idea to perform data prep operations on training data only then apply the operations using coefficients/etc. from training data on the test data.

  10. Dharmaraj November 24, 2017 at 1:29 am #

    Hi Jason,
    I have two questions,

    1. I am facing the same problem with feature selection and without feature selection. Without any feature selection method I got 99.10% accuracy for J48, but using CFS, Chi square and IG with different subsets I got less accuracy like 98.70%, 97% etc. Where I am wrong?

    2. It is related to Weka GUI and API, why I am getting different results for the same algorithm using gui and api. I searched a lot, but nothing useful found.
    Thanks

  11. Tadele Debisa November 24, 2017 at 7:09 pm #

    Dear Jason ,I am using three ML algorithm such as GA for feature selection,ANN and SVM for classification of data set. I want to use wrapper method,can you advice me how to apply crossover and Mutation operation concepts for pre-process.

    • Jason Brownlee November 25, 2017 at 10:15 am #

      Sorry, I do not have a worked example of GAs for feature selection.

  12. Dharmaraj November 24, 2017 at 7:44 pm #

    Thanks Jason.
    But , why I am getting different results for the same algorithm using Weka GUI and API on the same dataset.

  13. Sarah December 13, 2017 at 5:46 am #

    Dear Jason,
    I want to make a new feature selection algorithm, Can I make this using WEKA?

    • Jason Brownlee December 13, 2017 at 5:46 am #

      Yes, you can implement it yourself for use in Weka.

      • sarah December 13, 2017 at 5:57 am #

        Thanks for your reply
        Could you please advice me or give me a link to illustrative example!

  14. Vijay March 3, 2018 at 7:45 am #

    Great post Jason.

    I have struggled hard but could not find straight answers to few questions which apply to classification problems, your views would be very helpful.

    1) While Information Gain and Gini seem reasonable, do Pearson Correlation & Chi-Squared filters apply for binary variables / for classification problem typical of Diabetes dataset?

    2) Statistics professors & several online media strongly advocate doing attribute selection as part of the Cross Validation inner loop. Not doing so is a way of cheating since the training data has already been used for attribute selection & biases the estimates to produce smaller errors. Ian Witten’s in his MOOC recommends using the AttributeSelectedClassifier. Should these Filter methods be run on a test / validation set when using the Attribute selection tab in Weka?

  15. Irem March 6, 2018 at 8:32 pm #

    Dear Jason,
    What is the order of the executions of the attribute evaluator and the search method? I am trying to use ant search (with default evaluator fuzzy rough subset) and CfsSubsetEval for attribute evaluator. In this situation, firstly, the CfsSubsetEval function evaluates the attributes and gives the informative subsets (with merits), then ant search is done on all these subsets by evaluating with fuzzyRoughSubsetEval; is it true?
    Thanks

  16. Muhammad Irfan March 16, 2018 at 5:14 am #

    This post is regarding Feature Selection from a ready made CSV or ARFF file which could be made from raw data using excel or some python code etc. Can we generate features from any CSV or ARFF or EXCEL file using windowing (1 seconds or more) with overlap (50% or so) in WEKA ?

  17. Maha Alarifi March 24, 2018 at 2:22 am #

    Hello Dr.Jason
    Thank you for the very informative articles
    I am using attribute selection in weka for my graduation research about abnirmal behavior in video scenes.
    I was wondering if I had to set the parameters for each search method of the attribute selection?
    and also the results of the attribute evaluators I need some explanation.

  18. LenaM April 1, 2018 at 3:54 pm #

    Hi Jason,

    My assignmant states that I should use attribute selection and do testing to see the best results.

    Should I use attrubuteSelectionClassified or the attrubuteSelection?

    Is there any link to show mw how to compare results to come with the best set of attributes?

    • Jason Brownlee April 2, 2018 at 5:20 am #

      You must experiment to see what subset of features work best for your predictive modeling problem.

  19. Lucky . April 10, 2018 at 12:34 am #

    Can I use my algorithm for feature selection in weka?

  20. SURAJ KUMAR April 10, 2018 at 8:25 pm #

    In weka explorer when we are using the correlation attribute evaluate tap after importing our data. It is assigning the correlation coefficient to each of the feature with respect to deciding variable.

    I want to know how weka will know in which column I have placed my deciding variable?

  21. Arman May 4, 2018 at 4:12 am #

    Thank you, you’ve just opened a new world to me!

  22. Don May 13, 2018 at 5:34 pm #

    Hi

    When using InfoGainAttributeEval -> Output is Entropy. What units is this in and how is it calculated? (Entropy is usually measured in Bits from my understanding)

    Thank you kindly
    Don

  23. Yesh May 21, 2018 at 1:57 am #

    how can I use weka for deep learning based-feature selection for network intrusion detection system?

    thank you

  24. Raj June 1, 2018 at 4:53 pm #

    HI Jason , Really a good post and informatiove explanation.

    One simple query
    1) You have mentioned wrappersubsete eval for selecting subset of features
    Can we use Cfssubset eval for selection of features

    2) cn wwe use elbow method ( graph for selecting ) optimal set offeatures using correlation value of each feature with the class.

  25. obsa gilo July 13, 2018 at 10:51 pm #

    Dear Jason,iam working on information extraction for news text using classification of machine learning how can i aapply on WEKA

    • Jason Brownlee July 14, 2018 at 6:18 am #

      Sorry, I don’t have examples of working with text data in Weka.

  26. Abdrahman July 17, 2018 at 8:38 pm #

    Does any one knows where to fine PSO in Weka?

    Thank you

Leave a Reply