How to Save Your Machine Learning Model and Make Predictions in Weka

After you have found a well performing machine learning model and tuned it, you must finalize your model so that you can make predictions on new data.

In this post you will discover how to finalize your machine learning model, save it to file and load it later in order to make predictions on new data.

After reading this post you will know:

  • How to train a final version of your machine learning model in Weka.
  • How to save your finalized model to file.
  • How to load your finalized model later and use it to make predictions on new data.

Let’s get started.

How to Save Your Machine Learning Model and Make Predictions in Weka

How to Save Your Machine Learning Model and Make Predictions in Weka
Photo by Nick Kenrick, some rights reserved.

Tutorial Overview

This tutorial is broken down into 4 parts:

  1. Finalize Model where you will discover how to train a finalized version of your model.
  2. Save Model where you will discover how to save a model to file.
  3. Load Model where you will discover how to load a model from file.
  4. Make Predictions where you will discover how to make predictions for new data.

The tutorial provides a template that you can use to finalize your own machine learning algorithms on your data problems.

We are going to use the Pima Indians Onset of Diabetes dataset. Each instance represents medical details for one patient and the task is to predict whether the patient will have an onset of diabetes within the next five years. There are 8 numerical input variables and all have varying scales. You can learn more about this dataset on the UCI Machine Learning Repository. Top results are in the order of 77% accuracy.

We are going to finalize a logistic regression model on this dataset, both because it is a simple algorithm that is well understood and because it does very well on this problem.

Need more help with Weka for Machine Learning?

Take my free 14-day email course and discover how to use the platform step-by-step.

Click to sign-up and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

1. Finalize a Machine Learning Model

Perhaps the most neglected task in a machine learning project is how to finalize your model.

Once you have gone through all of the effort to prepare your data, compare algorithms and tune them on your problem, you actually need to create the final model that you intend to use to make new predictions.

Finalizing a model involves training the model on the entire training dataset that you have available.

1. Open the Weka GUI Chooser.

2. Click the “Explorer” button to open the Weka Explorer interface.

3. Load the Pima Indians onset of diabetes dataset from the data/diabetes.arff file.

Weka Load Pima Indians Onset of Diabetes Dataset

Weka Load Pima Indians Onset of Diabetes Dataset

4. Click the “Classify” tab to open up the classifiers.

5. Click the “Choose” button and choose “Logistic” under the “functions” group.

6. Select “Use training set” under “Test options”.

7. Click the “Start” button.

Weka Train Logistic Regression Model

Weka Train Logistic Regression Model

This will train the chosen Logistic regression algorithm on the entire loaded dataset. It will also evaluate the model on the entire dataset, but we are not interested in this evaluation.

It is assumed that you have already estimated the performance of the model on unseen data using cross validation as a part of selecting the algorithm you wish to finalize. It is this estimate you prepared previously that you can report when you need to inform others about the skill of your model.

Now that we have finalized the model, we need to save it to file.

2. Save Finalized Model To File

Continuing on from the previous section, we need to save the finalized model to a file on your disk.

This is so that we can load it up at a later time, or even on a different computer in the future and use it to make predictions. We won’t need the training data in the future, just the model of that data.

You can easily save a trained model to file in the Weka Explorer interface.

1. Right click on the result item for your model in the “Result list” on the “Classify” tab.

2. Click “Save model” from the right click menu.

Weka Save Model to File

Weka Save Model to File

3. Select a location and enter a filename such as “logistic”, click the “Save button.

Your model is now saved to the file “logistic.model”.

It is in a binary format (not text) that can be read again by the Weka platform. As such, it is a good idea to note down the version of Weka you used to create the model file, just in case you need the same version of Weka in the future to load the model and make predictions. Generally, this will not be a problem, but it is a good safety precaution.

You can close the Weka Explorer now. The next step is to discover how to load up the saved model.

3. Load a Finalized Model

You can load saved Weka models from file.

The Weka Explorer interface makes this easy.

1. Open the Weka GUI Chooser.

2. Click the “Explorer” button to open the Weka Explorer interface.

3. Load any old dataset, it does not matter. We will not be using it, we just need to load a dataset to get access to the “Classify” tab. If you are unsure, load the data/diabetes.arff file again.

4. Click the “Classify” tab to open up the classifiers.

5. Right click on the “Result list” and click “Load model”, select the model saved in the previous section “logistic.model”.

Weka Load Model From File

Weka Load Model From File

The model will now be loaded into the explorer.

We can now use the loaded model to make predictions for new data.

Weka Model Loaded From File Ready For Use

Weka Model Loaded From File Ready For Use

4. Make Predictions on New Data

We can now make predictions on new data.

First, let’s create some pretend new data. Make a copy of the file “data/diabetes.arff” and save it as “data/diabetes-new-data.arff“.

Open the file in a text editor.

Find the start of the actual data in the file with the @data on line 95.

We only want to keep 5 records. Move down 5 lines, then delete all the remaining lines of the file.

The class value (output variable) that we want to predict is on the end of each line. Delete each of the 5 output variables and replace them with question mark symbols (?).

Weka Dataset For Making New Predictions

Weka Dataset For Making New Predictions

We now have “unseen” data with no known output for which we would like to make predictions.

Continue on from the previous part of the tutorial where we already have the model loaded.

1. On the “Classify” tab, select the “Supplied test set” option in the “Test options” pane.

Weka Select New Dataset On Which To Make New Predictions

Weka Select New Dataset On Which To Make New Predictions

2. Click the “Set” button, click the “Open file” button on the options window and select the mock new dataset we just created with the name “diabetes-new-data.arff”. Click “Close” on the window.

3. Click the “More options…” button to bring up options for evaluating the classifier.

4. Uncheck the information we are not interested in, specifically:

  • “Output model”
  • “Output per-class stats”
  • “Output confusion matrix”
  • “Store predictions for visualization”
Weka Customized Test Options For Making Predictions

Weka Customized Test Options For Making Predictions

5. For the “Output predictions” option click the “Choose” button and select “PlainText”.

Weka Output Predictions in Plain Text Format

Weka Output Predictions in Plain Text Format

6. Click the “OK” button to confirm the Classifier evaluation options.

7. Right click on the list item for your loaded model in the “Results list” pane.

8. Select “Re-evaluate model on current test set”.

Weka Revaluate Loaded Model On Test Data And Make Predictions

Weka Revaluate Loaded Model On Test Data And Make Predictions

The predictions for each test instance are then listed in the “Classifier Output” pane. Specifically the middle column of the results with predictions like “tested_positive” and “tested_negative”.

You could choose another output format for the predictions, such as CSV, that you could later load into a spreadsheet like Excel. For example, below is an example of the same predictions in CSV format.

Weka Predictions Made on New Data By a Loaded Model

Weka Predictions Made on New Data By a Loaded Model

More Information

The Weka Wiki has some more information about saving and loading models as well as making predictions that you may find useful:

Summary

In this post you discovered how to finalize your model and make predictions for new unseen data. You can see how you can use this process to make predictions on new data yourself.

Specifically, you learned:

  • How to train a final instance of your machine learning model.
  • How to save a finalized model to file for later use.
  • How to load a model from file and use it to make predictions on new data.

Do you have any questions about how to finalize your model in Weka or about this post? Ask your questions in the comments below and I will do my best to answer them.


Want Machine Learning Without The Code?

Master Machine Learning With Weka

Develop Your Own Models in Minutes

…with just a few a few clicks

Discover how in my new Ebook:
Machine Learning Mastery With Weka

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring The Machine Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.


82 Responses to How to Save Your Machine Learning Model and Make Predictions in Weka

  1. Kanan October 26, 2016 at 2:34 am #

    Hi
    Can I open the saved model by any other program?
    I want to use the saved model as a web service, but not using weka for predication.
    Is there any way

    Thanks

  2. c October 28, 2016 at 8:49 am #

    How do I predict continuous output in Weka? I get “Problem evaluating classifier: Class index is negative (not set)!” error when I try to run model on test set with dependent variable.

  3. c October 28, 2016 at 10:10 am #

    Is the M5P not capable of regression predication with categorical and continuous variables? I thought regression trees could do that?

  4. Kayode November 10, 2016 at 2:04 pm #

    Thank you so much for this tutorial. It is really straightforward. Really enjoy it. Thanks.

  5. Ametun December 9, 2016 at 1:57 pm #

    Thank you so much for this tutorial. It is very useful for me.Thanks.

  6. Diane December 21, 2016 at 2:51 am #

    Hey, thank you very much for your help!
    Just a sidenote for those who have problems with doing the exact same thing as you described using .csv input files: The above description is perfect for .arff but in my case (with .csv) it made predictions for the first 112 lines only and stopped for no reason. Transforming the input (training and test data) solved that problem.
    I am looking forward to more tutorials from you 🙂

  7. bayo January 3, 2017 at 10:12 pm #

    Good day,

    Thanks for your good work. Please I need your assistance, i am working on crime and i am new in using weka.I have used weka to divide my data set intoo both test and training data set both in CSV format. but the system is complaining whenever I put classfier (such as Bayes, KNN) and i loaded the tested data set on it.

    Please help me on this

    • Jason Brownlee January 4, 2017 at 8:52 am #

      What is the error exactly? What is the complaint that Weka is making?

  8. Esan January 3, 2017 at 10:54 pm #

    Hello,
    Please should train dataset and test dataset be of the same format. If yes why then is my weka complain of incompatible test data set. Also is it the test data that we are converting back to plain test?

    • Jason Brownlee January 4, 2017 at 8:54 am #

      Yes, the train and test must be the same format, with the same number of columns.

      You may not know the predicted outcome, in which case you can use a ‘?’ value.

  9. bayo January 7, 2017 at 5:36 pm #

    Thanks. I really appreciate your efforts, you teaching was superb

  10. Lujain January 7, 2017 at 7:26 pm #

    Thanks for the tutorial. I have a question, why the number of instances is unknown? and how can I evaluate the accuracy of the prediction? I mean I need to see the number of correctly classified instances and so on…

    • Jason Brownlee January 8, 2017 at 5:21 am #

      This tutorial was about making prediction in new data.

      If you have data for which you already know the expected output, you can make predictions on it by selecting it as an external test dataset in Weka.

  11. Iqra Ameer January 8, 2017 at 12:05 pm #

    Hi,
    I need to train model on separate genre(blogs data) and test on another genre(hotel reviews). I trained a model by 1. appling StringToWordVector filter(change some settings of filter) 2. attribute selection 3. applied classify Logistic with option “use training set” 4. saved the model. Now I am confused about testing file, should I need to apply all these steps till 3 on test file also? by doing this my train and test file attributes are different but the same format.
    Should my training file attribute and test file attribute exactly the same(same to same)? If yes then can I copy the attributes from training file(top to @data) and paste in my test file, is it correct?
    If train and test file attributes can be different then there is an error “Data used to train model and test set are not compatible. Would you like to automatically wrap the classifier in “InputMappedClassifier”, what does it mean? if choose Yes what will it do.
    Sorry sir, I have many questions. I explored a lot still confused. It will be great help.
    Thank you

  12. Mike March 4, 2017 at 4:15 am #

    Hi,

    I have built a logistic regression model in Weka and want to be able to identify what the predictions were for each specific data point. The output I currently have does not allow me to match the predictions to the individual instances.

    Thanks,
    Mike

    • Jason Brownlee March 6, 2017 at 10:52 am #

      Hi Mike,

      The order of the predictions should match the order of the data in your input file used to make predictions.

  13. Bellz May 2, 2017 at 2:03 am #

    Awesome article! Very simple and right to the point!

  14. Kanika Sood May 11, 2017 at 5:47 pm #

    Hi Jason
    Great article. I followed the steps you suggested and I am applying Random Forest classifier. I have the same set of attributes for the training and test set. However in the stage where I predict for unknown data, it ignores all the instances. Below is the message I get in the classifier output:
    === Summary ===

    Total Number of Instances 0
    Ignored Class Unknown Instances 72

    === Confusion Matrix ===

    a b <– classified as
    0 0 | a = good
    0 0 | b = bad

    Can you please suggest what am I doing wrong?

    • Jason Brownlee May 12, 2017 at 7:36 am #

      Perhaps the test set data is corrupt or not loading correctly?

    • ifrah raoof February 25, 2018 at 3:48 pm #

      kanika sood …can you help me …i m stuck with the same error ???

  15. Kanika Sood May 11, 2017 at 6:54 pm #

    I got the answer to my earlier question. Here is the questions I have now: Random Forest, BayesNet always predicts only one class for all instances.

  16. Billy Rogers May 20, 2017 at 12:38 am #

    Elegant, simple, exercise.

  17. Kittikorn June 27, 2017 at 1:33 am #

    My apology if there was someone asked this question already but i couldn’t find here.

    When I used my model to predict new data, the result in output file showed only 101 items/instances. May I ask how to make the model to predict all records (about 5000 records)?

    • Jason Brownlee June 27, 2017 at 8:32 am #

      There is no limit, pass all inputs to model.predict(X) to get predictions.

  18. Angel Gallardo August 2, 2017 at 2:03 am #

    Hi Jason thanks for the post! Is there a way to get the top 3 predictions?

    • Jason Brownlee August 2, 2017 at 7:55 am #

      What do you mean exactly “top 3 predictions”?

  19. Denis August 22, 2017 at 2:32 am #

    First of all, thank you for making me discovering this Weka. I am one of the many that after a tutorial, after a confusion matrix, I was saying”great!, now what?” 🙂

    I just ended a very long course on Data Science and Python on Udemy… is it too daring from me thinking that Weka can substitute python? (at least for simple tasks?)

  20. sam August 22, 2017 at 9:07 pm #

    this blog is really helpful, can you please suggest me how can I make UI application on the top of the model using Python where users can put the data manually and it will give the result like positive or negative

    • Jason Brownlee August 23, 2017 at 6:49 am #

      Sorry, I don’t know about UI applications in Python. Perhaps a web interface?

  21. Ilan August 26, 2017 at 5:46 am #

    Thanks Jason, this is super helpful. Do you know if there is a way to save particular multilayer percepton configurations? I’m running the percepton classifier and set GUI to true in order to tinker with it, but I can’t for the life of me figure out how to save the tinkered configuration so that I can reuse it. I’ve looked everywhere.

    • Jason Brownlee August 26, 2017 at 6:49 am #

      After you fit the model you can save it.

      When you run the model, the Explorer window should give you the command line parameters needed to re-create the model configuration at the top.

  22. Ilan August 26, 2017 at 7:44 am #

    Hmm, that correctly saved the usual parameters like Num Epochs, Learning Rate, etc., but it didn’t save the particular percepton GUI tweaks — say, ones where I connect and disconnect certain nodes to other certain nodes by hand using the percepton GUI.

    Did I miss a step, or is there something else I’m supposed to do that’s unique to allowing it to save changes made in the GUI?

    Thank you!

  23. Zoya September 25, 2017 at 8:25 pm #

    Thanks for the tutorial. I am new to Weka and machine learning. The tutorial helped a lot. Just wanted to know how to judge the predicted value for a particular instance? Is the prediction done in order?

    === Predictions on user test set ===

    inst# actual predicted error prediction
    1 1:? 2:tested_positive 0.722
    2 1:? 1:tested_negative 0.951
    3 1:? 2:tested_positive 0.797
    4 1:? 1:tested_negative 0.958
    5 1:? 2:tested_positive 0.902

    Also, what does 2 in 2:tested_positive mean?

    • Jason Brownlee September 26, 2017 at 5:36 am #

      Great question. Yes, the order of the predictions will match the order of the observations in the input file.

      The prediction is probably a class number (1 or 2) and the associated label in the problem (positive or negative).

  24. Zoya September 26, 2017 at 2:28 pm #

    Thank you

  25. Yaw Antwi-Adjei October 29, 2017 at 12:35 am #

    Hi Jason. Thank you for the good tutorial. Is that all there is to making predictions using WEKA? I mean,
    a) Choose the appropriate Model (i.e Classifier)
    b) Run it on the Supplied Test Set
    c) Save the Model
    d) Load and dataset in WEKA Explorer just to have access to the Classifier tab
    e) Load your Model
    f) Open the new file, and finally
    g) Re-evaluate the model on the new file for your predictions.

    • Jason Brownlee October 29, 2017 at 5:54 am #

      Yes! It is perfect for beginners.

      You can go deeper on various aspects and I recommend using the Experiment for being systematic in your exploration.

  26. Rubel November 2, 2017 at 3:41 am #

    how can I calculate each predictor odd ratio, 95% CI, and P-value.

  27. Rubel November 2, 2017 at 3:42 am #

    When I am going to installed new packages, it is showing an error message. How can I solve this problem

  28. Haya November 8, 2017 at 10:16 pm #

    Hello Jason,

    How can I make predictions and produce Actual value by using R program ?

  29. Pedro November 18, 2017 at 9:40 am #

    Hello Jason,

    Say I am trying to further tune and test the algorithms, and I have separate test and training sets, which contain different distribution of the instances so that I can choose to mimic real world distribution or keep it 50/50 and see which option gives me better accuracy with the test set (that will have real-world-like distribution). I would not like to save many models, naturally. Could I then re-evaluate without saving it, skipping to step four as soon as I finish cross-validation with the training set?

  30. oksana December 3, 2017 at 1:21 am #

    thank you very much for great tutorials, Dr. Brownlee. They help me a lot in my final project at school.

    I would like to perform this kind of predictive modeling techniques at work, but we work with very large data sets (millions of tuples) so my question is – would Weka be able to handle very large data sets?
    Weka seems very easy and user friendly tool.

    • Jason Brownlee December 3, 2017 at 5:25 am #

      I would recommend taking a sample of your data to model, small enough to fit into memory with Weka.

  31. kanishka January 9, 2018 at 2:40 pm #

    how to carry out weka result to androide phone

  32. Lina February 15, 2018 at 5:24 am #

    Hi jason,
    Im lina and i read each tutorial step on top …. But still confuse, if we use totally new data as a test set, can it run properly? Example on top show you use 5 same data to predict the class….

    • Jason Brownlee February 15, 2018 at 8:51 am #

      Sorry, I don’t follow. Perhaps you could restate your question or give more context?

  33. Jac May 18, 2018 at 11:30 pm #

    Hello,

    How can I make website with PMML model implemented to be available for a public use?
    For example user input 10 parametrs and receive a result calculated by PMML?

    • Jason Brownlee May 19, 2018 at 7:40 am #

      Sorry, I don’t have an example of creating a website from a model.

  34. Ben G. June 16, 2018 at 3:23 pm #

    Hi Jason, many thanks for this great work you are involved in. Please, is there any provision for deploying Ripple Down Rule (RIDOR) Learner in WEKA? If it is possible, how can I go about it?

  35. Guylaine Bourque July 10, 2018 at 10:52 am #

    Hello! Many thanks for this tutorial.
    I am wondering how come that it does not save the results in a file ? Do I have to cut%paste the output in a csv file ?

    • Jason Brownlee July 10, 2018 at 2:26 pm #

      Yes. You will have to save it manually. Weka was built more for exploring models than for using models.

  36. Rodrigo Nava September 18, 2018 at 8:20 am #

    Hello. Thanks for the tutorial.
    My question is:
    Is it possible to perform Cross-validation or Split-percentage in data loaded from a model?
    Or if I want to perform any of those two, I have necessarily to load the corresponding training dataset and build a new model for them?

    • Jason Brownlee September 18, 2018 at 2:18 pm #

      What do you mean exactly?

      CV and split are methods for using a training data to evaluate a model. How could it “come” from the model?

      • Rodrigo Nava September 18, 2018 at 4:08 pm #

        Thanks for the answer.
        I have the following situation:
        I use a dataset “training.arff” and a classifier, say RandomForest, to generate a model “model1.model”; then I save it.
        If I want to evaluate a testing set with it, I load “model1.model” and use the option “reevaluate model on current test set”. Everything is ok until that point.

        But if I want to validate my model, I find that there’s no direct way to use CV or split directly over the data used from model1.model. I have necessarily to reload “training.arff”, use CV, and see how it says “building model for training data”, meaning that it is generating another model.

        I was wondering if it was possible to validate generated models.
        Again, thank you for your feedback

        • Jason Brownlee September 19, 2018 at 6:15 am #

          I recommend validating the model prior to saving and making predictions.

  37. Shabbar Imran October 11, 2018 at 6:03 pm #

    hi how can i predict between two different data sets

  38. Las Hsu October 28, 2018 at 6:23 pm #

    HI thanks for the hard work, it really helped me a lot.
    Here’s my question
    === Predictions on user test set ===

    inst# actual predicted error prediction
    1 1:? 2:tested_positive 0.722
    2 1:? 1:tested_negative 0.951
    3 1:? 2:tested_positive 0.797
    4 1:? 1:tested_negative 0.958
    5 1:? 2:tested_positive 0.902
    what does the number below prediction means?
    the 0.722 ,0.951,0.797
    does it mean the probability of the prediction being correct?

    • Jason Brownlee October 29, 2018 at 5:55 am #

      Yes, the probability of the prediction for the class.

  39. Saubhik Paladhi November 9, 2018 at 6:24 pm #

    Hi, thanks for your informative article.
    I have a query about the indexes of test data instances choosen by weka at the time of cross validation. How to get the index of the test data that is being tested ?

    =======
    I have choosen:

    Dataset : iris.arff
    Total instances : 150
    Classifier : J48
    cross validation: 10 fold

    I have also made output prediction as “PlainText”

    =============
    In the output window I can see like this :-

    inst# actual predicted error prediction
    1 3:Iris-virginica 3:Iris-virginica 0.976
    2 3:Iris-virginica 3:Iris-virginica 0.976
    3 3:Iris-virginica 3:Iris-virginica 0.976
    4 3:Iris-virginica 3:Iris-virginica 0.976
    5 3:Iris-virginica 3:Iris-virginica 0.976
    6 1:Iris-setosa 1:Iris-setosa 1
    7 1:Iris-setosa 1:Iris-setosa 1

    ….

    Total 10 test data set .(15 instances in each).

    ======================

    As WEKA uses startified cross validation, instances in the test data sets are randomly choosen.

    So, How to know the index of the test data instance whose prediction evaluation is being shown in above lines?

    i.e

    inst# actual predicted error prediction
    1 3:Iris-virginica 3:Iris-virginica 0.976

    This result is for which instance (among total 50 Iris-virginica) ?

    ===============

    in the main data file first few instances are :
    5.1,3.5,1.4,0.2,Iris-setosa
    4.9,3.0,1.4,0.2,Iris-setosa
    4.7,3.2,1.3,0.2,Iris-setosa
    4.6,3.1,1.5,0.2,Iris-setosa
    5.0,3.6,1.4,0.2,Iris-setosa
    5.4,3.9,1.7,0.4,Iris-setosa

    So the main data file starts with Iris-setosa.

    • Jason Brownlee November 10, 2018 at 6:00 am #

      The index should be the row number in the file.

  40. Otaku san November 27, 2018 at 8:31 am #

    Hi, thank you for the wonderful tutorial.
    I am using csv instead of arff.
    When I supply test set with 145 true and 70 false instances (in that order), the result is shown only for 145 instances. It doesn’t calculate the result for the 70 instances.
    If the set is randomly ordered, the result is shown only for the first few instances with same true/false value. For e.g., if the first ten instances are false, and 11th is true, the result (and confusion matrix) is only calculated for the first ten instances.
    Please help.

  41. Nikhil December 5, 2018 at 1:09 am #

    How do we get the output prediction in the original form?

Leave a Reply