How to Save Your Machine Learning Model and Make Predictions in Weka

Last Updated on

After you have found a well performing machine learning model and tuned it, you must finalize your model so that you can make predictions on new data.

In this post you will discover how to finalize your machine learning model, save it to file and load it later in order to make predictions on new data.

After reading this post you will know:

  • How to train a final version of your machine learning model in Weka.
  • How to save your finalized model to file.
  • How to load your finalized model later and use it to make predictions on new data.

Discover how to prepare data, fit models, and evaluate their predictions, all without writing a line of code in my new book, with 18 step-by-step tutorials and 3 projects with Weka.

Let’s get started.

How to Save Your Machine Learning Model and Make Predictions in Weka

How to Save Your Machine Learning Model and Make Predictions in Weka
Photo by Nick Kenrick, some rights reserved.

Tutorial Overview

This tutorial is broken down into 4 parts:

  1. Finalize Model where you will discover how to train a finalized version of your model.
  2. Save Model where you will discover how to save a model to file.
  3. Load Model where you will discover how to load a model from file.
  4. Make Predictions where you will discover how to make predictions for new data.

The tutorial provides a template that you can use to finalize your own machine learning algorithms on your data problems.

We are going to use the Pima Indians Onset of Diabetes dataset. Each instance represents medical details for one patient and the task is to predict whether the patient will have an onset of diabetes within the next five years. There are 8 numerical input variables and all have varying scales. You can learn more about this dataset on the UCI Machine Learning Repository. Top results are in the order of 77% accuracy.

We are going to finalize a logistic regression model on this dataset, both because it is a simple algorithm that is well understood and because it does very well on this problem.

Need more help with Weka for Machine Learning?

Take my free 14-day email course and discover how to use the platform step-by-step.

Click to sign-up and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

1. Finalize a Machine Learning Model

Perhaps the most neglected task in a machine learning project is how to finalize your model.

Once you have gone through all of the effort to prepare your data, compare algorithms and tune them on your problem, you actually need to create the final model that you intend to use to make new predictions.

Finalizing a model involves training the model on the entire training dataset that you have available.

1. Open the Weka GUI Chooser.

2. Click the “Explorer” button to open the Weka Explorer interface.

3. Load the Pima Indians onset of diabetes dataset from the data/diabetes.arff file.

Weka Load Pima Indians Onset of Diabetes Dataset

Weka Load Pima Indians Onset of Diabetes Dataset

4. Click the “Classify” tab to open up the classifiers.

5. Click the “Choose” button and choose “Logistic” under the “functions” group.

6. Select “Use training set” under “Test options”.

7. Click the “Start” button.

Weka Train Logistic Regression Model

Weka Train Logistic Regression Model

This will train the chosen Logistic regression algorithm on the entire loaded dataset. It will also evaluate the model on the entire dataset, but we are not interested in this evaluation.

It is assumed that you have already estimated the performance of the model on unseen data using cross validation as a part of selecting the algorithm you wish to finalize. It is this estimate you prepared previously that you can report when you need to inform others about the skill of your model.

Now that we have finalized the model, we need to save it to file.

2. Save Finalized Model To File

Continuing on from the previous section, we need to save the finalized model to a file on your disk.

This is so that we can load it up at a later time, or even on a different computer in the future and use it to make predictions. We won’t need the training data in the future, just the model of that data.

You can easily save a trained model to file in the Weka Explorer interface.

1. Right click on the result item for your model in the “Result list” on the “Classify” tab.

2. Click “Save model” from the right click menu.

Weka Save Model to File

Weka Save Model to File

3. Select a location and enter a filename such as “logistic”, click the “Save button.

Your model is now saved to the file “logistic.model”.

It is in a binary format (not text) that can be read again by the Weka platform. As such, it is a good idea to note down the version of Weka you used to create the model file, just in case you need the same version of Weka in the future to load the model and make predictions. Generally, this will not be a problem, but it is a good safety precaution.

You can close the Weka Explorer now. The next step is to discover how to load up the saved model.

3. Load a Finalized Model

You can load saved Weka models from file.

The Weka Explorer interface makes this easy.

1. Open the Weka GUI Chooser.

2. Click the “Explorer” button to open the Weka Explorer interface.

3. Load any old dataset, it does not matter. We will not be using it, we just need to load a dataset to get access to the “Classify” tab. If you are unsure, load the data/diabetes.arff file again.

4. Click the “Classify” tab to open up the classifiers.

5. Right click on the “Result list” and click “Load model”, select the model saved in the previous section “logistic.model”.

Weka Load Model From File

Weka Load Model From File

The model will now be loaded into the explorer.

We can now use the loaded model to make predictions for new data.

Weka Model Loaded From File Ready For Use

Weka Model Loaded From File Ready For Use

4. Make Predictions on New Data

We can now make predictions on new data.

First, let’s create some pretend new data. Make a copy of the file “data/diabetes.arff” and save it as “data/diabetes-new-data.arff“.

Open the file in a text editor.

Find the start of the actual data in the file with the @data on line 95.

We only want to keep 5 records. Move down 5 lines, then delete all the remaining lines of the file.

The class value (output variable) that we want to predict is on the end of each line. Delete each of the 5 output variables and replace them with question mark symbols (?).

Weka Dataset For Making New Predictions

Weka Dataset For Making New Predictions

We now have “unseen” data with no known output for which we would like to make predictions.

Continue on from the previous part of the tutorial where we already have the model loaded.

1. On the “Classify” tab, select the “Supplied test set” option in the “Test options” pane.

Weka Select New Dataset On Which To Make New Predictions

Weka Select New Dataset On Which To Make New Predictions

2. Click the “Set” button, click the “Open file” button on the options window and select the mock new dataset we just created with the name “diabetes-new-data.arff”. Click “Close” on the window.

3. Click the “More options…” button to bring up options for evaluating the classifier.

4. Uncheck the information we are not interested in, specifically:

  • “Output model”
  • “Output per-class stats”
  • “Output confusion matrix”
  • “Store predictions for visualization”
Weka Customized Test Options For Making Predictions

Weka Customized Test Options For Making Predictions

5. For the “Output predictions” option click the “Choose” button and select “PlainText”.

Weka Output Predictions in Plain Text Format

Weka Output Predictions in Plain Text Format

6. Click the “OK” button to confirm the Classifier evaluation options.

7. Right click on the list item for your loaded model in the “Results list” pane.

8. Select “Re-evaluate model on current test set”.

Weka Revaluate Loaded Model On Test Data And Make Predictions

Weka Revaluate Loaded Model On Test Data And Make Predictions

The predictions for each test instance are then listed in the “Classifier Output” pane. Specifically the middle column of the results with predictions like “tested_positive” and “tested_negative”.

You could choose another output format for the predictions, such as CSV, that you could later load into a spreadsheet like Excel. For example, below is an example of the same predictions in CSV format.

Weka Predictions Made on New Data By a Loaded Model

Weka Predictions Made on New Data By a Loaded Model

More Information

The Weka Wiki has some more information about saving and loading models as well as making predictions that you may find useful:


In this post you discovered how to finalize your model and make predictions for new unseen data. You can see how you can use this process to make predictions on new data yourself.

Specifically, you learned:

  • How to train a final instance of your machine learning model.
  • How to save a finalized model to file for later use.
  • How to load a model from file and use it to make predictions on new data.

Do you have any questions about how to finalize your model in Weka or about this post? Ask your questions in the comments below and I will do my best to answer them.

Discover Machine Learning Without The Code!

Master Machine Learning With Weka

Develop Your Own Models in Minutes

...with just a few a few clicks

Discover how in my new Ebook:
Machine Learning Mastery With Weka

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more...

Finally Bring The Machine Learning To Your Own Projects

Skip the Academics. Just Results.

See What's Inside

140 Responses to How to Save Your Machine Learning Model and Make Predictions in Weka

  1. Kanan October 26, 2016 at 2:34 am #

    Can I open the saved model by any other program?
    I want to use the saved model as a web service, but not using weka for predication.
    Is there any way


  2. c October 28, 2016 at 8:49 am #

    How do I predict continuous output in Weka? I get “Problem evaluating classifier: Class index is negative (not set)!” error when I try to run model on test set with dependent variable.

  3. c October 28, 2016 at 10:10 am #

    Is the M5P not capable of regression predication with categorical and continuous variables? I thought regression trees could do that?

  4. Kayode November 10, 2016 at 2:04 pm #

    Thank you so much for this tutorial. It is really straightforward. Really enjoy it. Thanks.

  5. Ametun December 9, 2016 at 1:57 pm #

    Thank you so much for this tutorial. It is very useful for me.Thanks.

  6. Diane December 21, 2016 at 2:51 am #

    Hey, thank you very much for your help!
    Just a sidenote for those who have problems with doing the exact same thing as you described using .csv input files: The above description is perfect for .arff but in my case (with .csv) it made predictions for the first 112 lines only and stopped for no reason. Transforming the input (training and test data) solved that problem.
    I am looking forward to more tutorials from you 🙂

  7. bayo January 3, 2017 at 10:12 pm #

    Good day,

    Thanks for your good work. Please I need your assistance, i am working on crime and i am new in using weka.I have used weka to divide my data set intoo both test and training data set both in CSV format. but the system is complaining whenever I put classfier (such as Bayes, KNN) and i loaded the tested data set on it.

    Please help me on this

    • Jason Brownlee January 4, 2017 at 8:52 am #

      What is the error exactly? What is the complaint that Weka is making?

  8. Esan January 3, 2017 at 10:54 pm #

    Please should train dataset and test dataset be of the same format. If yes why then is my weka complain of incompatible test data set. Also is it the test data that we are converting back to plain test?

    • Jason Brownlee January 4, 2017 at 8:54 am #

      Yes, the train and test must be the same format, with the same number of columns.

      You may not know the predicted outcome, in which case you can use a ‘?’ value.

  9. bayo January 7, 2017 at 5:36 pm #

    Thanks. I really appreciate your efforts, you teaching was superb

  10. Lujain January 7, 2017 at 7:26 pm #

    Thanks for the tutorial. I have a question, why the number of instances is unknown? and how can I evaluate the accuracy of the prediction? I mean I need to see the number of correctly classified instances and so on…

    • Jason Brownlee January 8, 2017 at 5:21 am #

      This tutorial was about making prediction in new data.

      If you have data for which you already know the expected output, you can make predictions on it by selecting it as an external test dataset in Weka.

  11. Iqra Ameer January 8, 2017 at 12:05 pm #

    I need to train model on separate genre(blogs data) and test on another genre(hotel reviews). I trained a model by 1. appling StringToWordVector filter(change some settings of filter) 2. attribute selection 3. applied classify Logistic with option “use training set” 4. saved the model. Now I am confused about testing file, should I need to apply all these steps till 3 on test file also? by doing this my train and test file attributes are different but the same format.
    Should my training file attribute and test file attribute exactly the same(same to same)? If yes then can I copy the attributes from training file(top to @data) and paste in my test file, is it correct?
    If train and test file attributes can be different then there is an error “Data used to train model and test set are not compatible. Would you like to automatically wrap the classifier in “InputMappedClassifier”, what does it mean? if choose Yes what will it do.
    Sorry sir, I have many questions. I explored a lot still confused. It will be great help.
    Thank you

  12. Mike March 4, 2017 at 4:15 am #


    I have built a logistic regression model in Weka and want to be able to identify what the predictions were for each specific data point. The output I currently have does not allow me to match the predictions to the individual instances.


    • Jason Brownlee March 6, 2017 at 10:52 am #

      Hi Mike,

      The order of the predictions should match the order of the data in your input file used to make predictions.

  13. Bellz May 2, 2017 at 2:03 am #

    Awesome article! Very simple and right to the point!

  14. Kanika Sood May 11, 2017 at 5:47 pm #

    Hi Jason
    Great article. I followed the steps you suggested and I am applying Random Forest classifier. I have the same set of attributes for the training and test set. However in the stage where I predict for unknown data, it ignores all the instances. Below is the message I get in the classifier output:
    === Summary ===

    Total Number of Instances 0
    Ignored Class Unknown Instances 72

    === Confusion Matrix ===

    a b <– classified as
    0 0 | a = good
    0 0 | b = bad

    Can you please suggest what am I doing wrong?

    • Jason Brownlee May 12, 2017 at 7:36 am #

      Perhaps the test set data is corrupt or not loading correctly?

    • ifrah raoof February 25, 2018 at 3:48 pm #

      kanika sood …can you help me …i m stuck with the same error ???

  15. Kanika Sood May 11, 2017 at 6:54 pm #

    I got the answer to my earlier question. Here is the questions I have now: Random Forest, BayesNet always predicts only one class for all instances.

  16. Billy Rogers May 20, 2017 at 12:38 am #

    Elegant, simple, exercise.

  17. Kittikorn June 27, 2017 at 1:33 am #

    My apology if there was someone asked this question already but i couldn’t find here.

    When I used my model to predict new data, the result in output file showed only 101 items/instances. May I ask how to make the model to predict all records (about 5000 records)?

    • Jason Brownlee June 27, 2017 at 8:32 am #

      There is no limit, pass all inputs to model.predict(X) to get predictions.

  18. Angel Gallardo August 2, 2017 at 2:03 am #

    Hi Jason thanks for the post! Is there a way to get the top 3 predictions?

    • Jason Brownlee August 2, 2017 at 7:55 am #

      What do you mean exactly “top 3 predictions”?

  19. Denis August 22, 2017 at 2:32 am #

    First of all, thank you for making me discovering this Weka. I am one of the many that after a tutorial, after a confusion matrix, I was saying”great!, now what?” 🙂

    I just ended a very long course on Data Science and Python on Udemy… is it too daring from me thinking that Weka can substitute python? (at least for simple tasks?)

  20. sam August 22, 2017 at 9:07 pm #

    this blog is really helpful, can you please suggest me how can I make UI application on the top of the model using Python where users can put the data manually and it will give the result like positive or negative

    • Jason Brownlee August 23, 2017 at 6:49 am #

      Sorry, I don’t know about UI applications in Python. Perhaps a web interface?

  21. Ilan August 26, 2017 at 5:46 am #

    Thanks Jason, this is super helpful. Do you know if there is a way to save particular multilayer percepton configurations? I’m running the percepton classifier and set GUI to true in order to tinker with it, but I can’t for the life of me figure out how to save the tinkered configuration so that I can reuse it. I’ve looked everywhere.

    • Jason Brownlee August 26, 2017 at 6:49 am #

      After you fit the model you can save it.

      When you run the model, the Explorer window should give you the command line parameters needed to re-create the model configuration at the top.

  22. Ilan August 26, 2017 at 7:44 am #

    Hmm, that correctly saved the usual parameters like Num Epochs, Learning Rate, etc., but it didn’t save the particular percepton GUI tweaks — say, ones where I connect and disconnect certain nodes to other certain nodes by hand using the percepton GUI.

    Did I miss a step, or is there something else I’m supposed to do that’s unique to allowing it to save changes made in the GUI?

    Thank you!

  23. Zoya September 25, 2017 at 8:25 pm #

    Thanks for the tutorial. I am new to Weka and machine learning. The tutorial helped a lot. Just wanted to know how to judge the predicted value for a particular instance? Is the prediction done in order?

    === Predictions on user test set ===

    inst# actual predicted error prediction
    1 1:? 2:tested_positive 0.722
    2 1:? 1:tested_negative 0.951
    3 1:? 2:tested_positive 0.797
    4 1:? 1:tested_negative 0.958
    5 1:? 2:tested_positive 0.902

    Also, what does 2 in 2:tested_positive mean?

    • Jason Brownlee September 26, 2017 at 5:36 am #

      Great question. Yes, the order of the predictions will match the order of the observations in the input file.

      The prediction is probably a class number (1 or 2) and the associated label in the problem (positive or negative).

  24. Zoya September 26, 2017 at 2:28 pm #

    Thank you

  25. Yaw Antwi-Adjei October 29, 2017 at 12:35 am #

    Hi Jason. Thank you for the good tutorial. Is that all there is to making predictions using WEKA? I mean,
    a) Choose the appropriate Model (i.e Classifier)
    b) Run it on the Supplied Test Set
    c) Save the Model
    d) Load and dataset in WEKA Explorer just to have access to the Classifier tab
    e) Load your Model
    f) Open the new file, and finally
    g) Re-evaluate the model on the new file for your predictions.

    • Jason Brownlee October 29, 2017 at 5:54 am #

      Yes! It is perfect for beginners.

      You can go deeper on various aspects and I recommend using the Experiment for being systematic in your exploration.

  26. Rubel November 2, 2017 at 3:41 am #

    how can I calculate each predictor odd ratio, 95% CI, and P-value.

  27. Rubel November 2, 2017 at 3:42 am #

    When I am going to installed new packages, it is showing an error message. How can I solve this problem

  28. Haya November 8, 2017 at 10:16 pm #

    Hello Jason,

    How can I make predictions and produce Actual value by using R program ?

  29. Pedro November 18, 2017 at 9:40 am #

    Hello Jason,

    Say I am trying to further tune and test the algorithms, and I have separate test and training sets, which contain different distribution of the instances so that I can choose to mimic real world distribution or keep it 50/50 and see which option gives me better accuracy with the test set (that will have real-world-like distribution). I would not like to save many models, naturally. Could I then re-evaluate without saving it, skipping to step four as soon as I finish cross-validation with the training set?

  30. oksana December 3, 2017 at 1:21 am #

    thank you very much for great tutorials, Dr. Brownlee. They help me a lot in my final project at school.

    I would like to perform this kind of predictive modeling techniques at work, but we work with very large data sets (millions of tuples) so my question is – would Weka be able to handle very large data sets?
    Weka seems very easy and user friendly tool.

    • Jason Brownlee December 3, 2017 at 5:25 am #

      I would recommend taking a sample of your data to model, small enough to fit into memory with Weka.

  31. kanishka January 9, 2018 at 2:40 pm #

    how to carry out weka result to androide phone

  32. Lina February 15, 2018 at 5:24 am #

    Hi jason,
    Im lina and i read each tutorial step on top …. But still confuse, if we use totally new data as a test set, can it run properly? Example on top show you use 5 same data to predict the class….

    • Jason Brownlee February 15, 2018 at 8:51 am #

      Sorry, I don’t follow. Perhaps you could restate your question or give more context?

  33. Jac May 18, 2018 at 11:30 pm #


    How can I make website with PMML model implemented to be available for a public use?
    For example user input 10 parametrs and receive a result calculated by PMML?

    • Jason Brownlee May 19, 2018 at 7:40 am #

      Sorry, I don’t have an example of creating a website from a model.

  34. Ben G. June 16, 2018 at 3:23 pm #

    Hi Jason, many thanks for this great work you are involved in. Please, is there any provision for deploying Ripple Down Rule (RIDOR) Learner in WEKA? If it is possible, how can I go about it?

  35. Guylaine Bourque July 10, 2018 at 10:52 am #

    Hello! Many thanks for this tutorial.
    I am wondering how come that it does not save the results in a file ? Do I have to cut%paste the output in a csv file ?

    • Jason Brownlee July 10, 2018 at 2:26 pm #

      Yes. You will have to save it manually. Weka was built more for exploring models than for using models.

  36. Rodrigo Nava September 18, 2018 at 8:20 am #

    Hello. Thanks for the tutorial.
    My question is:
    Is it possible to perform Cross-validation or Split-percentage in data loaded from a model?
    Or if I want to perform any of those two, I have necessarily to load the corresponding training dataset and build a new model for them?

    • Jason Brownlee September 18, 2018 at 2:18 pm #

      What do you mean exactly?

      CV and split are methods for using a training data to evaluate a model. How could it “come” from the model?

      • Rodrigo Nava September 18, 2018 at 4:08 pm #

        Thanks for the answer.
        I have the following situation:
        I use a dataset “training.arff” and a classifier, say RandomForest, to generate a model “model1.model”; then I save it.
        If I want to evaluate a testing set with it, I load “model1.model” and use the option “reevaluate model on current test set”. Everything is ok until that point.

        But if I want to validate my model, I find that there’s no direct way to use CV or split directly over the data used from model1.model. I have necessarily to reload “training.arff”, use CV, and see how it says “building model for training data”, meaning that it is generating another model.

        I was wondering if it was possible to validate generated models.
        Again, thank you for your feedback

        • Jason Brownlee September 19, 2018 at 6:15 am #

          I recommend validating the model prior to saving and making predictions.

  37. Shabbar Imran October 11, 2018 at 6:03 pm #

    hi how can i predict between two different data sets

  38. Las Hsu October 28, 2018 at 6:23 pm #

    HI thanks for the hard work, it really helped me a lot.
    Here’s my question
    === Predictions on user test set ===

    inst# actual predicted error prediction
    1 1:? 2:tested_positive 0.722
    2 1:? 1:tested_negative 0.951
    3 1:? 2:tested_positive 0.797
    4 1:? 1:tested_negative 0.958
    5 1:? 2:tested_positive 0.902
    what does the number below prediction means?
    the 0.722 ,0.951,0.797
    does it mean the probability of the prediction being correct?

    • Jason Brownlee October 29, 2018 at 5:55 am #

      Yes, the probability of the prediction for the class.

  39. Saubhik Paladhi November 9, 2018 at 6:24 pm #

    Hi, thanks for your informative article.
    I have a query about the indexes of test data instances choosen by weka at the time of cross validation. How to get the index of the test data that is being tested ?

    I have choosen:

    Dataset : iris.arff
    Total instances : 150
    Classifier : J48
    cross validation: 10 fold

    I have also made output prediction as “PlainText”

    In the output window I can see like this :-

    inst# actual predicted error prediction
    1 3:Iris-virginica 3:Iris-virginica 0.976
    2 3:Iris-virginica 3:Iris-virginica 0.976
    3 3:Iris-virginica 3:Iris-virginica 0.976
    4 3:Iris-virginica 3:Iris-virginica 0.976
    5 3:Iris-virginica 3:Iris-virginica 0.976
    6 1:Iris-setosa 1:Iris-setosa 1
    7 1:Iris-setosa 1:Iris-setosa 1


    Total 10 test data set .(15 instances in each).


    As WEKA uses startified cross validation, instances in the test data sets are randomly choosen.

    So, How to know the index of the test data instance whose prediction evaluation is being shown in above lines?


    inst# actual predicted error prediction
    1 3:Iris-virginica 3:Iris-virginica 0.976

    This result is for which instance (among total 50 Iris-virginica) ?


    in the main data file first few instances are :

    So the main data file starts with Iris-setosa.

    • Jason Brownlee November 10, 2018 at 6:00 am #

      The index should be the row number in the file.

  40. Otaku san November 27, 2018 at 8:31 am #

    Hi, thank you for the wonderful tutorial.
    I am using csv instead of arff.
    When I supply test set with 145 true and 70 false instances (in that order), the result is shown only for 145 instances. It doesn’t calculate the result for the 70 instances.
    If the set is randomly ordered, the result is shown only for the first few instances with same true/false value. For e.g., if the first ten instances are false, and 11th is true, the result (and confusion matrix) is only calculated for the first ten instances.
    Please help.

  41. Nikhil December 5, 2018 at 1:09 am #

    How do we get the output prediction in the original form?

  42. Laurence Foz December 16, 2018 at 10:58 pm #

    Hello there Jason. Have been following some of your tutorials on here for some time. Glad to see you still answer questions. Mine is regarding the test set. I made all the instances of class in the @data region as “?” like in the example but why is the result of my model’s classification like this?

    “Total Number of Instances: 0
    Ignored Class Unknown Instances: 7401”

    Did I do something wrong? Also the model I used was made with LibSVM.

    • Jason Brownlee December 17, 2018 at 6:21 am #

      No, you can ignore that note. We are forcing Weka to do something that it does not want to do – make predictions in the Explorer.

  43. Raymond January 12, 2019 at 2:46 am #

    If I have a model build with J48. and StringToWordVector
    How do I feed in the data for the model classifier to classify in java ?

    Classifier cls = (Classifier)“c:\\identify.model”);
    ArrayList classes = new ArrayList(3);

    //Class attribute
    Attribute classAttribute = new Attribute(“class”, classes);
    ArrayList attributes = new ArrayList(2);
    Attribute text=new Attribute(“text”, true);

    // Create the empty dataset “sample” with above attributes
    Instances sample = new Instances(“sample”, attributes, 0);
    // Make position the class attribute
    // Create empty instance with five attribute values
    Instance inst = new DenseInstance(2);
    // Set instance values
    inst.setValue(text, “What is this are you kidding me 1 2 3 4”);
    // Set instance’s dataset to be the dataset “race”
    // Set class as missing so we can predict
    inst.setClassValue(0); // When I set class as missing, the filter not working at all.


    StringToWordVector filter = createFilter(sample);
    sample=filter.useFilter(sample, filter);

    • Jason Brownlee January 12, 2019 at 5:44 am #

      I sorry, I don’t have exampels of Java programming with the Weka API, I cannot give you advice.

  44. Alex January 12, 2019 at 4:46 pm #

    I want to add one more column to the .arff file which I do not want to be used by classifier, but which I want to be present on the prediction output, it is just kind of name for each instance which I need to have in the output – how would I go about it in Explorer?
    Thanks a lot.

    • Jason Brownlee January 13, 2019 at 5:40 am #

      I’m not sure off hand, sorry. Perhaps try posting to the weka users group.

  45. Vassilis January 18, 2019 at 12:01 am #

    Hello Jason! Thank you for your helpful tutorials!

    I used MPRegressor and got the following model:

    MPRegressor with ridge value 0.01 and 2 hidden units (useCGD=true)

    Output unit 0 weight for hidden unit 0: 2.9058790401172043

    Hidden unit 0 weights:

    -0.2878670472862872 A
    0.4012790926803488 B
    0.6114550533482614 C
    0.09745324473246314 D
    -0.26600053341756486 E

    Hidden unit 0 bias: -0.1802615484156791

    Output unit 0 weight for hidden unit 1: -0.239991138003868

    Hidden unit 1 weights:

    -6.4472828043452175 A
    -4.770061719076585 B
    -4.318804805199649 C
    -2.077452814137676 D
    1.0959052105040001 E

    Hidden unit 1 bias: 0.21400772835364332

    Output unit 0 bias: -1.2579970537867124

    Is there a way I can use this model in excel?

    • Jason Brownlee January 18, 2019 at 5:41 am #

      Well don!

      I don’t know about using Weka models in excel, sorry.

  46. Harry January 31, 2019 at 11:11 am #

    Hello sir good day!

    can you help me how to setup logistic regression model so that its prediction is more than 1 or 2.

    === Predictions on user test set ===

    inst# actual predicted error prediction predicted error prediction
    1 1:? 4:BSBA 0.328 6:BEED 0.618

    is this possible?

    • Jason Brownlee January 31, 2019 at 2:24 pm #

      Generally, logistic regression is for binary classification problems.

      • Harry January 31, 2019 at 2:49 pm #

        Thanks you sir. So can you tell me sir what function to be use to have a multiple prediction?
        It will be very much appreciated.

  47. jack February 4, 2019 at 7:19 am #

    I trained a very big dataset with 1gb size but the model file is only 300kb. is this normal?

    • Jason Brownlee February 4, 2019 at 7:25 am #

      The size of the model is proportional to the complexity of the model, which can be unrelated to the number of examples in the dataset.

  48. jack February 5, 2019 at 11:30 pm #

    I also have this problem. I created the model but my test dataset have some extra attributes and weka uses InputMappedClassifier, but it retrains the model instead of just testing it. why is this happening?

    • Jason Brownlee February 6, 2019 at 7:46 am #

      Perhaps try preparing the new data separately to have identical structure to the training dataset, e.g. in excel or a text editor?

  49. Dirlene February 8, 2019 at 8:55 am #

    Hello Jason,

    I really need your help.
    I followed the tutorial steps to make predictions on weka. However, an error occurs:
    problem evaluating classifier: class index is negative (not set)!

    What should I do to correct the error?

    • Jason Brownlee February 8, 2019 at 2:06 pm #

      I’m not sure, some ideas:

      Perhaps the dataset was not loaded correctly?
      Perhaps the class variable was not specified?

  50. Dirlene February 9, 2019 at 7:28 am #

    Thank you.
    I was able to predict the model.
    One more doubt.
    I’m working with prediction of evasion in distance education.
    A model created with RandomForest, during training, obtained accuracy of 90.01% and F-measure of 0.906. But when I use the model to make predictions, it classifies all instances as evasion (YES). The database used to test the model is similar to the one used in the training. I have already reviewed the databases and repeated the entire process, but there was no change in the results. Do you have any idea how I could solve it?

    === Confusion Matrix ===

    a b <– classified as

    323 0 | a = SIM
    109 0 | b = NAO


    • Jason Brownlee February 10, 2019 at 9:36 am #

      Sorry, I don’t follow. What is the problem you are having exactly?

  51. mr February 12, 2019 at 8:17 pm #

    Is the order of rows important in a dataset? can I put all the rows with class “A” first and all the rows with class “B” next?

    • Jason Brownlee February 13, 2019 at 7:57 am #

      It can be for LSTMs and for SGD. It really depends on the context.

      Often, when fitting a model we want to shuffle the rows each epoch to avoid learning any ordering.

  52. Nouf February 12, 2019 at 9:35 pm #

    Hello Jason,
    Thank you for sharing this knowledge with us.
    I have question, I am beginner in WEKA and I have go through the steps above, I have read that the test set should have the same number of attributes as the training set, but how it would be possible knowing that I am working on tweets and the word vector would be different.

    your kind assistance is highly appreciated.

    • Jason Brownlee February 13, 2019 at 7:58 am #

      You would encode the text to the same length vector, e.g. a bag of words with the same sized vocab.

  53. sam February 19, 2019 at 2:12 pm #

    i want to predict mental health using weka .i understood these steps but dont know how to create file with some attributes.can u please guide me on that.

  54. nuwan February 21, 2019 at 4:46 pm #

    i can use that model in r

  55. tensor February 22, 2019 at 8:50 pm #

    Are you familiar with Tensorflow? Can I train and save a model in tensorflow and make predictions in weka?

  56. Tharanga March 4, 2019 at 9:04 pm #

    hi jason,
    can i load that saved model in r.

    thank you

  57. Tharanga March 5, 2019 at 2:06 pm #

    sorry load in r.

  58. Sanjeev Das March 19, 2019 at 8:56 am #

    Hi Jason,

    I used the weights and thresholds shown by weka for multilayer perceptron (MLP) in my custom C code to do the prediction on the same training data.
    However, Weka’s result does not match to my C code implementation results.

    In my C code, I am using Feedfoward model (MLP), where the weights and thresholds are obtained from the Weka trained model.
    The computation at each node is simple as shown below:
    Sum_node = threshold;
    for( i = 0 ; i < Num_Input ; i++)
    Sum_node += Input[i] * Weight[i][j] ;
    hidden_node = 1/(1 + exp(-Sum_node)) ;

    Could you please tell me if Weka has different way of implementation for testing a new data?

    • Jason Brownlee March 19, 2019 at 9:05 am #

      It may have differences in the implementation.

      You can check the source code, it is open source.

  59. Khan April 11, 2019 at 8:29 pm #

    How to use Weka in Python?

    • Jason Brownlee April 12, 2019 at 7:44 am #

      I don’t see why not.

      I don’t have an example, sorry.

  60. Leandro April 16, 2019 at 1:36 am #

    Hi Jason,

    Do you know the maximal amount of required input data per a weka classifier?

  61. Abhishek Santhanam May 7, 2019 at 4:14 pm #

    First of all thanks for this great article, was very helpful. Could you please help me as to how to do the same in java(in eclipse IDE) using the weka.jar library.

  62. Art May 28, 2019 at 5:53 am #

    I have a data set in .crv format and imported it to weka in .aarf format. I know the set is good —it was provided by my instructor. I’ve written a good decision tree. I want to use three of the attributes in the data to predict if a person has a bladder infection based on the responses, ie is their temperature over 40 degrees and did they have nausea and lumbar pain. What’s the first step to using weka to produce something like that? Do I load the data and use UserClassifier to make a tree? I guess I’m confused—I want weka to cull out the people with high temps and either/or lumbar pain or nauseaandIm not sure how to proceed.

  63. getaneh July 9, 2019 at 11:58 pm #

    Tank you so much !!
    I got very good knowledge from you about How to Save Machine Learning Model and Make Predictions in Weka dear if it is possible you send for how to do predication for all my dataset by using this result

  64. Awal July 22, 2019 at 10:53 pm #

    Please can a use a saved model from explorer in the experimenter.

  65. Lei September 19, 2019 at 9:25 am #

    I used WEKA a few years ago and I like a lot. Weka IS the machine learning software to go to years before Python and R.

    Can you share any light why WEKA is not as popular as Python/R today?

    • Jason Brownlee September 19, 2019 at 1:49 pm #

      I agree!

      I don’t know. Perhaps a bias against GUIs? or Java?

      It’s madness because you can be so productive in Weka without writing a line of code. AND you grok the machine learning/model-building/selection process immediately.

  66. anu October 17, 2019 at 4:54 am #

    I am supposed to use weka on the Linux system to calculate unweighted average recall from confusion matrix can i know the script to do it

    • Jason Brownlee October 17, 2019 at 6:41 am #

      Sorry, I don’t have any scripts for Weka. I only show how to use the GUI interface.

  67. anu October 17, 2019 at 7:38 pm #

    oh okay, thanks soo much! I have to work on output predictions is there any tutorial which explains output predictions

Leave a Reply