Finding an accurate machine learning model is not the end of the project.
In this post you will discover how to save and load your machine learning model in Python using scikit-learn.
This allows you to save your model to file and load it later in order to make predictions.
Let’s get started.
- Update Jan/2017: Updated to reflect changes to the scikit-learn API in version 0.18.
- Update March/2018: Added alternate link to download the dataset as the original appears to have been taken down.

Save and Load Machine Learning Models in Python with scikit-learn
Photo by Christine, some rights reserved.
Need help with Machine Learning in Python?
Take my free 2-week email course and discover data prep, algorithms and more (with sample code).
Click to sign-up now and also get a free PDF Ebook version of the course.
Finalize Your Model with pickle
Pickle is the standard way of serializing objects in Python.
You can use the pickle operation to serialize your machine learning algorithms and save the serialized format to a file.
Later you can load this file to deserialize your model and use it to make new predictions.
The example below demonstrates how you can train a logistic regression model on the Pima Indians onset of diabetes dataset, save the model to file and load it to make predictions on the unseen test set (update: download from here).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
# Save Model Using Pickle import pandas from sklearn import model_selection from sklearn.linear_model import LogisticRegression import pickle url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv" names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] dataframe = pandas.read_csv(url, names=names) array = dataframe.values X = array[:,0:8] Y = array[:,8] test_size = 0.33 seed = 7 X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=test_size, random_state=seed) # Fit the model on 33% model = LogisticRegression() model.fit(X_train, Y_train) # save the model to disk filename = 'finalized_model.sav' pickle.dump(model, open(filename, 'wb')) # some time later... # load the model from disk loaded_model = pickle.load(open(filename, 'rb')) result = loaded_model.score(X_test, Y_test) print(result) |
Running the example saves the model to finalized_model.sav in your local working directory. Load the saved model and evaluating it provides an estimate of accuracy of the model on unseen data.
1 |
0.755905511811 |
Finalize Your Model with joblib
Joblib is part of the SciPy ecosystem and provides utilities for pipelining Python jobs.
It provides utilities for saving and loading Python objects that make use of NumPy data structures, efficiently.
This can be useful for some machine learning algorithms that require a lot of parameters or store the entire dataset (like K-Nearest Neighbors).
The example below demonstrates how you can train a logistic regression model on the Pima Indians onset of diabetes dataset, saves the model to file using joblib and load it to make predictions on the unseen test set.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
# Save Model Using joblib import pandas from sklearn import model_selection from sklearn.linear_model import LogisticRegression from sklearn.externals import joblib url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv" names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] dataframe = pandas.read_csv(url, names=names) array = dataframe.values X = array[:,0:8] Y = array[:,8] test_size = 0.33 seed = 7 X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=test_size, random_state=seed) # Fit the model on 33% model = LogisticRegression() model.fit(X_train, Y_train) # save the model to disk filename = 'finalized_model.sav' joblib.dump(model, filename) # some time later... # load the model from disk loaded_model = joblib.load(filename) result = loaded_model.score(X_test, Y_test) print(result) |
Running the example saves the model to file as finalized_model.sav and also creates one file for each NumPy array in the model (four additional files). After the model is loaded an estimate of the accuracy of the model on unseen data is reported.
1 |
0.755905511811 |
Tips for Finalizing Your Model
This section lists some important considerations when finalizing your machine learning models.
- Python Version. Take note of the python version. You almost certainly require the same major (and maybe minor) version of Python used to serialize the model when you later load it and deserialize it.
- Library Versions. The version of all major libraries used in your machine learning project almost certainly need to be the same when deserializing a saved model. This is not limited to the version of NumPy and the version of scikit-learn.
- Manual Serialization. You might like to manually output the parameters of your learned model so that you can use them directly in scikit-learn or another platform in the future. Often the algorithms used by machine learning algorithms to make predictions are a lot simpler than those used to learn the parameters can may be easy to implement in custom code that you have control over.
Take note of the version so that you can re-create the environment if for some reason you cannot reload your model on another machine or another platform at a later time.
Summary
In this post you discovered how to persist your machine learning algorithms in Python with scikit-learn.
You learned two techniques that you can use:
- The pickle API for serializing standard Python objects.
- The joblib API for efficiently serializing Python objects with NumPy arrays.
Do you have any questions about saving and loading your machine learning algorithms or about this post? Ask your questions in the comments and I will do my best to answer them.
Thank you so much for this educative post.
You’re welcome Kayode.
Hi Jason,
I have two of your books and they are awesome. I took several machine learning courses before, however as you mentioned they are more geared towards theory than practicing. I devoured your Machine Learnign with Python book and 20x my skills compared to the courses I took.
I found this page by Googling a code snippet in chapter 17 in your book. The line:
loaded_model = pickle.load(open(filename, ‘rb’))
throws the error:
runfile(‘C:/Users/Tony/Documents/MassData_Regression_Pickle.py’, wdir=’C:/Users/Tony/Documents’)
File “C:/Users/Tony/Documents/MassData_Regression_Pickle.py”, line 55
loaded_model = pickle.load(open(filename, ‘rb’))
^
SyntaxError: invalid syntax
Thanks TonyD.
I wonder if there is a copy-paste error, like an extra space or something?
Does the code example (.py file) provided with the book for that chapter work for you?
Hello, Jason
Where we can get X_test, Y_test “sometime later”? It is “garbag collected”!
X_test, Y_test not pickled In your example you pickle classifier only but you keep refer to x and y. Real applications is not single flow I found work around and get Y from clf.classes_ object.
What is correct solution? Should we pickle decorator class with X and Y or use pickled classifier to pull Ys values? I didn’t find legal information from documentation on KNeighborclassifier(my example) as well; how to pull Y values from classifier.
Can you advise?
Hi Konstantin,
I would not suggest saving the data. The idea is to show how to load the model and use it on new data – I use existing data just for demonstration purposes.
You can load new data from file in the future when you load your model and use that new data to make a prediction.
If you have the expected values also (y), you can compare the predictions to the expected values and see how well the model performed.
I’m newer Pythoner, your code works perfect! But where is the saved file? I used windows 10.
Thanks Guangping.
The save file is in your current working directory, when running from the commandline.
If you’re using a notebook or IDE, I don’t know where the file is placed.
Hi Jason ,
I am just wondering if can we use Yaml or Json with sklearn library . I tried to do it many times but I could not reach to an answer . I tried to do it as your lesson of Kares , but for some reason is not working . hopefully you can help me if it is possible
Hi Mohammed, I believe the serialization of models to yaml and json is specific to the Keras library.
sklearn serialization is focused on binary files like pickle.
Hi, my name is Normando Zubia and I have been reading a lot of your material for my school lessons.
I’m currently working on a model to predict user behavoir in a production environment. Due to several situations I can not save the model in a pickle file. Do you know any way to save the model in a json file?
I have been playing a little with sklearn classes and I noticed that if I save some parameters for example: n_values_, feature_indices_ and active_features_ in a OneHotEncoding model I can reproduce the results. Could this be done with a pipeline? Or do you think I need to save each model’s parameters to load each model?
PS: Sorry for my bad english and thanks for your attention.
Hi Normando,
If you are using a simple model, you could save the coefficients directly to file. You can then try and put them back in a new model later or implement the prediction part of the algorithm yourself (very easy for most methods).
Let me know how you go.
Hello Jason,
I am new to machine learning. I am your big fan and read a lot of your blog and books. Thank you very much for teaching us machine learning.
I tried to pickle my model but fail. My model is using VGG16 and replace the top layer for my classification solution. I further narrowed down the problem and find that it is the VGG16 model failed to pickle. Please find my simplified code below and error log below:
It will be highly appreciated if you can give me some direction on how to fix this error.
Thank you very much
———————————————————-
# Save Model Using Pickle
from keras.applications.vgg16 import VGG16
import pickle
model = VGG16(weights=’imagenet’, include_top=False)
filename = ‘finalized_model.sav’
pickle.dump(model, open(filename, ‘wb’))
—————————————————-
/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/samueltin/Projects/bitbucket/share-card-ml/pickle_test.py
Using TensorFlow backend.
Traceback (most recent call last):
File “/Users/samueltin/Projects/bitbucket/share-card-ml/pickle_test.py”, line 8, in
pickle.dump(model, open(filename, ‘wb’))
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 1376, in dump
Pickler(file, protocol).dump(obj)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 224, in dump
self.save(obj)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 331, in save
self.save_reduce(obj=obj, *rv)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 425, in save_reduce
save(state)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 655, in save_dict
self._batch_setitems(obj.iteritems())
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 669, in _batch_setitems
save(v)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 606, in save_list
self._batch_appends(iter(obj))
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 621, in _batch_appends
save(x)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 331, in save
self.save_reduce(obj=obj, *rv)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 425, in save_reduce
save(state)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 655, in save_dict
self._batch_setitems(obj.iteritems())
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 669, in _batch_setitems
save(v)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 606, in save_list
self._batch_appends(iter(obj))
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 621, in _batch_appends
save(x)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 331, in save
self.save_reduce(obj=obj, *rv)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 425, in save_reduce
save(state)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 655, in save_dict
self._batch_setitems(obj.iteritems())
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 669, in _batch_setitems
save(v)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 331, in save
self.save_reduce(obj=obj, *rv)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 425, in save_reduce
save(state)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 655, in save_dict
self._batch_setitems(obj.iteritems())
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 669, in _batch_setitems
save(v)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 331, in save
self.save_reduce(obj=obj, *rv)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 425, in save_reduce
save(state)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 655, in save_dict
self._batch_setitems(obj.iteritems())
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 669, in _batch_setitems
save(v)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 655, in save_dict
self._batch_setitems(obj.iteritems())
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 669, in _batch_setitems
save(v)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 606, in save_list
self._batch_appends(iter(obj))
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 621, in _batch_appends
save(x)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 331, in save
self.save_reduce(obj=obj, *rv)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 425, in save_reduce
save(state)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 655, in save_dict
self._batch_setitems(obj.iteritems())
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 669, in _batch_setitems
save(v)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 331, in save
self.save_reduce(obj=obj, *rv)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 425, in save_reduce
save(state)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 655, in save_dict
self._batch_setitems(obj.iteritems())
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 669, in _batch_setitems
save(v)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 331, in save
self.save_reduce(obj=obj, *rv)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 425, in save_reduce
save(state)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 655, in save_dict
self._batch_setitems(obj.iteritems())
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 669, in _batch_setitems
save(v)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 606, in save_list
self._batch_appends(iter(obj))
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 621, in _batch_appends
save(x)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 568, in save_tuple
save(element)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 655, in save_dict
self._batch_setitems(obj.iteritems())
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 669, in _batch_setitems
save(v)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py”, line 306, in save
rv = reduce(self.proto)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy_reg.py”, line 70, in _reduce_ex
raise TypeError, “can’t pickle %s objects” % base.__name__
TypeError: can’t pickle module objects
Process finished with exit code 1
Sorry Samuel, I have not tried to save a pre-trained model before. I don’t have good advice for you.
Let me know how you go.
I have trained a model using liblinearutils. The model could not be saved using pickle as it gives error that ctype module with pointers cannot be pickled. How can I save my model?
Sorry Amy, I don’t have any specific examples to help.
Perhaps you can save the coefficients of your model to file?
Thanks a lot, very useful
You’re welcome!
My saved modells are 500MB+ Big….is that normal?
Ouch, that does sound big.
If your model is large (lots of layers and neurons) then this may make sense.
How to use model file (“finalized_model.sav”) to test unknown data. Like, if the model is for tagger , how this model will tag the text file data? Is there any example?
You can load the saved model and start making predictions (e.g. yhat = model.predict(X)).
See this post on finalizing models:
http://machinelearningmastery.com/train-final-machine-learning-model/
Dear Sir, please advice on how to extract weights from pickle dump? Thank you
I would suggest extracting coefficients from your model directly and saving them in your preferred format.
Hi I love your website; it’s very useful!
Are there any examples showing how to save out the training of a model after say 100 epochs/iterations? It’s not immediately clear from looking at joblib or scikit learn.
This is esp. useful when dealing with large datasets and/or computers or clusters which may be unreliable (e.g., subject to system reboots, etc.)
I’m not sure how to do this with sklearn. You may need to write something custom. Consider posting to stackoverflow.
Hey!
Is it possible to open my saved model and make a prediction on cloud server where is no sklearn installed?
no.
You could save the coefficients from within the model instead and write your own custom prediction code.
Hello Jason and thank you very much, it’s been very helpful.
Do you know if it’s possible to load features transformation with the ML model?
I’m mostly thinking of categorical variables that we need to encode into numerical ones.
I’m using sklearn to do that, but I don’t know if we can (as for Spark), integrate this transformation with the ML model into the serialized file (Pickle or Joblib).
#Encode categorical variable into numerical ones
from sklearn.preprocessing import LabelEncoder
list_var = [‘country’, ‘city’]
encoder = LabelEncoder()
for i in list_var:
df[i] = encoder.fit_transform(df[i])
Then I fit the model on the training dataset…
And I need to save this transformation with the model. Do you know if that’s possible ?
Thank you!
I’m not sure I follow sorry.
You can transform your data for your model, and you can apply this same transform in the future when you load your model.
You can save the transform objects using pickle. Is that what you mean?
Hi Jason,
Kindly accept my encomiums for the illustrative lecture that you have delivered on Machine Learning using Python.
**********************************************
# save the model to disk
filename = ‘finalized_model.sav’
joblib.dump(model, filename)
# sometime later…
# load the model from disk
loaded_model = joblib.load(filename)
result = loaded_model.score(X_test, Y_test)
print(result)
*******************************************************
After saving the model ‘finalized_model.sav’ , How can recall the saved model in the new session at later date?
I would appreciate if you can advice on this
The code after “sometime later” would be in a new session.
Hello sir,
The above code saves the model and later we can check the accuracy also
but what i have to do for making predicting the class of unknown data?
I mean which function have to be called ?
eg: 2,132,40,35,168,43.1,2.288,33
can you suggest how to get the class of above data through prediction ?
thank you
Pass in input data to the predict function and use the result.
Can we use “pickling” to save an LSTM model and to load or used a hard-coded pre-fit model to generate forecasts based on data passed in to initialize the model?
When I tried to use it, it gave me following error:
PicklingError: Can’t pickle : attribute lookup module on builtins failed
No.
See this tutorial on how to save Keras models:
http://machinelearningmastery.com/save-load-keras-deep-learning-models/
Great. It worked.
You are awesome Jason. Appreciated.
Glad to hear it.
tbh this is best of the sites on web. Great!
I love the email subscriptions of yours as a beginner they are quite helpful to me .
Thanks, I’m glad to hear that.
Hi @Jason Brownlee thanks for such informative blog. Can you please guide me for a problem where i would like to retrain the .pkl model only with new dataset with new class keeping the previous learning intact. I had thought that model.fit(dataset,label) will do that but it forgets the previous learning. Please suggest me some techniques for it.
Thanks
Sorry, I don’t follow. Can you please restate your question?
Hi Jason, I believe @vikash is looking for a way to continuously train the model with new examples after the initial training stage. This is something I am searching for as well. I know it is possible to retrain a model in tensorflow with new examples but I am not sure if it’s possible with sklearn.
to expand the question some more: 1-you train a model with sklearn 2-save it with pickle or joblib
3-then you get your hands on some new examples that were not available at the time of initial training “step 1” 4-you load the previous model 5-and now you try to train the model again using the new data without losing the previous knowledge… is step 5 possible with sklearn?
I have not updated a model in sklearn, but I would expect you can.
Here is an example of updating a model in Keras which may help in general principle:
https://machinelearningmastery.com/update-lstm-networks-training-time-series-forecasting/
Hi Json,
I need your guidance on Updation of saved pickle files with new data coming in for training
I recall 3 methods, Online Learning which is train one every new observation coming in and in this case model would always be biased towards new features ,which i dont wana do
Second is, Whenever some set of n observations comes, embedd it with previous data and do retraining again from scratch, that i dont want to do as in live environment it will take lot of time
Third is Mini batch learning, i know some of algorithm like SGD and other use partial fit method and do same but I have other algorithms as week like random forest , decision tress, logistic regression. I wana ask can i update the previously trained pickle with new training ?
I am doing it in text classification, I read that possibly doing this, model update pickle will not take new features of new data ( made using tfidf or countvectorizer) and it would be of less help.
Also as domain is same, and If client(Project we are working for) is different , inspite of sharing old data with new client (new project), could i use old client trained model pickle and update it with training in new client data. Basically I am transferring learning
Great question.
This is a challenging problem to solve. Really, the solution must be specific to your project requirements.
A flexible approach may be to build-in capacity into your encodings to allow for new words in the future.
The simplest approach is to ignore new words.
These, and other strategies are testable. See how performance degrades under both schemes with out-of-band test data.
Gracias por compartir,
Existe alguna forma en la que pueda realizar predicciones con nuevos datos solo con el modelo guardado? llamando este modelo desde un archivo nuevo? lo he intentado con la instruccion final:
# load the model from disk
loaded_model = pickle.load(open(filename, ‘rb’))
result = loaded_model.score(X_test, Y_test)
print(result)
pero no lo he logrado
373/5000
Thanks for sharing,
Is there any way I can make predictions with new data only with the saved model? calling this model from a new file? I have tried with the final instruction:
# load the model from disk
loaded_model = pickle.load (open (filename, ‘rb’))
result = loaded_model.score (X_test, Y_test)
print (result)
but I have not achieved it
That is exactly what we do in this tutorial.
What is the problem exactly?
Hi Jason, I learn a lot reading your python books and blogs. Thank you for everything.
I’m having an issue when I work on text data with loaded model in a different session. I fit and transform training data with countvectorizer and tfidf. Then I only transform the test data with the fitted instances as usual. But, when work on loaded pretrained model in a different session, I am having problem in feature extraction. I can’t just transform the test data as it asks for fitted instance which is not present in the current session. If I fit and transform on test data only, model prediction performance drastically decreases. I believe that is wrong way of doing machine learning. So, how can I do the feature extraction using countvectorizer, tfidf or other cases while working with previously trained model?
I’m using spark ML but I think it would be the same for scikit-learn as well.
Perhaps you can pickle your data transform objects as well, and re-use them in the second session?
Hi Jason,
I trained a random forest model and saved the same as a pickle file in my local desktop. I then copied that pickle file to my remote and tested the model with the same file and it is giving incorrect predictions. I am using python 3.6 in my local and python 3.4 in my remote, however the version of scikit-learn are same. Any ideas why this may be happening?
No idea, perhaps see if the experiment can be replicated on the same machine? or different machines with the same version of Python?
Hi Jason Brownlee,
I have a LogisticRegression model for binary classification. I wish to find a similar data points in a trained model for a given test data points. So that I can show these are similar data points predicted with these same class.
Could you please suggest your thoughts for the same. I am using scikit learn logistic regression
Thanks
Perhaps you could find data points with a low Euclidean distance from each other?
Hi Jason –
If you pickle a model trained on a subset of features, is it possible to view these features after loading the pickled model in a different file? For example: original df has features a,b,c,d,e,f. You train the model on a,c,e. Is it possible to load the pickled model in a separate script and see the model was trained on a,c,e?
Thanks,
James
Yes, you can save your model, load your model, then use it to make predictions on new data.
Hi Jason,
Thanks for explaining it so nicely. I am new to this and will be needing your guidance. I have data using which I have trained the model. Now I want this model to predict an untested data set. However, my requirement is an output which will have the data and corresponding prediction by the model. For example, record 1 – type a, record 2 – type a, record 3 – type c and so on. Could you please guide me on this?
You can provide predictions one at a time or in a group to the model and the predictions will be in the same order as the inputs.
Does that help?
Hi,
I am using chunks functionality in the read csv method in pandas and trying to build the model iteratively and save it. But it always saves the model that is being built in the last chunk and not the entire model. Can you help me with it
clf_SGD = SGDClassifier(loss=’modified_huber’, penalty=’l2′, alpha=1e-3, max_iter=500, random_state=42)
pd.read_csv(“file_name”,chunksize = 1000):
“””
data preparation and cleaning
“””
hashing = hv.fit_transform(X_train[‘description’])
clf_SGD.partial_fit(hashing, y_train, classes= y_classes)
joblib.dump(clf_SGD, source_folder + os.path.sep+’text_clf_sgd.pkl’)
Sorry, I’m not sure I follow, could you please try reframing your question?
Hi Jason,
This is extremely helpful and saved me quite a bit of processing time.
I was training a Random Forest Classifier on a 250MB data which took 40 min to train everytime but results were accurate as required. The joblib method created a 4GB model file but the time was cut down to 7 Minutes to load. That was helpful but the results got inaccurate or atleast varied quite a bit from the original results. I use average of 2 Decision Tree and 1 Random Forest for the model. Decision Tree Models have kept there consistency loading vs training but RF hasn’t. Any ideas?
Thank you very useful!!
You’re welcome.
Hello, if i load model
loaded_model = joblib.load(filename)
result = loaded_model.score(X_test, Y_test)
print(result)
can i use this model for another testsets to prediction?
Sure.
Hi Jason,
How do I generated new X_Test for prediction ? This new X_Test needs to make sure that the passed parameters are same in the model was trained with.
Background: I am basically saving the model and predicting with new values from time to time. How do we check whether the new values have all the parameters and correct data type.
Visualization and statistics.
I have many posts on the topic, try the search box.
Jason. Very good article. As asked by others, in my case I am using DecisionTreeClassifier with text feature to int transformation. Eventhough, you mentioned that transformation map can also be picked and read back, is there any example available? Will it be stored in the same file or it will be another file?
In a separate file.
Thank you so much professor
we get more new knowledge
You’re welcome. Also, I’m not a professor.
HI sir,
I would like to save predicted output as a CSV file. After doing ML variable I would like to save “y_predicted”. And I’m using python ide 3.5.x I have pandas,sklearn,tensorflow libraries
You can save the numpy array as a csv.
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.savetxt.html
Hi Jason,
I would like to save predicted output as a CSV file. After doing ML variable I would like to save “y_predicted”. How I can save Naive Bayes, SVM, RF and DT Classification for final predictions for all samples saved as a .csv with three columns namely Sample, Actual value, Prediction
values
Perhaps create a dataframe with all the columns you require and save the dataframe directly via to_csv():
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html
I have a list of regression coefficients from a paper. Is there a way to load these coefficients into the sklearn logistic regression function to try and reproduce their model?
Thanks!
Tommy
No model is needed, use each coefficient to weight the inputs on the data, the weighted sum is the prediction.
Hi,all
I am using scikit 0.19.1
I generated a training model using random forest and saved the model. These were done on ubuntu 16.01 x86_64.
I copied the model to a windows 10 64 bit machine and wanted to reuse the saved model. But unfortunately i get the following
Traceback (most recent call last):
File “C:\Users\PC\Documents\Vincent\nicholas\feverwizard.py.py”, line 19, in
rfmodel=joblib.load(modelfile)
File “C:\Python27\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py”, line 578, in load
obj = _unpickle(fobj, filename, mmap_mode)
File “C:\Python27\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py”, line 508, in _unpickle
obj = unpickler.load()
File “C:\Python27\lib\pickle.py”, line 864, in load
dispatchkey
File “C:\Python27\lib\pickle.py”, line 1139, in load_reduce
value = func(*args)
File “sklearn\tree_tree.pyx”, line 601, in sklearn.tree._tree.Tree.cinit
ValueError: Buffer dtype mismatch, expected ‘SIZE_t’ but got ‘long long’
What could be happening? Is it because of a switch from ubuntu to windows? However i am able to reuse the model in my ubuntu.
Perhaps the pickle file is not portable across platforms?
Can we load model trained on 64 bit system on 32 bit operating system..?
I’m skeptical that it would work. Try it and see. Let me know how you go.
Dear Jason :
Thank you for ‘le cours’ which is very comprehensive.
I have a maybe tricky but ‘could be very usefull’ question about my newly created standard Python object.
Is it possible to integrate a call to my Python object in a Fortran program ?
Basically I have a deterministic model in which I would like to make recursive calls to my Python object at every time step.
Do I need some specific libraries ?
Thank you
Best regards
You’re welcome.
I suspect it is possible. It’s all just code at the end of the day. You might need some kind of Python-FORTRAN bridge software. I have not done this, sorry.