Deploy Your Predictive Model To Production

5 Best Practices For Operationalizing Machine Learning.

Not all predictive models are at Google-scale.

Sometimes you develop a small predictive model that you want to put in your software.

I recently received this reader question:

Actually, there is a part that is missing in my knowledge about machine learning. All tutorials give you the steps up until you build your machine learning model. How could you use this model?

In this post, we look at some best practices to ease the transition of your model into production and ensure that you get the most out of it.

How To Deploy Your Predictive Model To Production

How To Deploy Your Predictive Model To Production
Photo by reynermedia, some rights reserved.

I Have a Model. Now What?

So you have been through a systematic process and created a reliable and accurate model that can make predictions for your problem.

You want to use this model somehow.

  • Maybe you want to create a standalone program that can make ad hoc predictions.
  • Maybe you want to incorporate the model into your existing software.

Let’s assume that your software is modest. You are not looking for Google-sized scale deployment. Maybe it’s just for you, maybe just a client or maybe for a few workstations.

So far so good?

Now we need to look at some best practices to put your accurate and reliable model into operations.

5 Model Deployment Best Practices

Why not just slap the model into your software and release?

You could. But by adding a few additional steps you can build confidence that the model that you’re deploying is maintainable and remains accurate over the long term.

Have you put a model into production?
Please leave a comment and share your experiences.

Below a five best practice steps that you can take when deploying your predictive model into production.

1. Specify Performance Requirements

You need to clearly spell out what constitutes good and bad performance.

This maybe as accuracy or false positives or whatever metrics are important to the business.

Spell out, and use the current model you have developed as the baseline numbers.

These numbers may be increased over time as you improve the system.

Performance requires are important. Without them, you will not be able to setup the tests you will need to determine if the system is behaving as expected.

Do not proceed until you have agreed upon minimum, mean or a performance range expectation.

2. Separate Prediction Algorithm From Model Coefficients

You may have used a library to create your predictive model. For example, R, scikit-learn or Weka.

You can choose to deploy your model using that library or re-implement the predictive aspect of the model in your software. You may even want to setup your model as a web service.

Regardless, it is good practice to separate the algorithm that makes predictions from the model internals. That is the specific coefficients or structure within the model learned from your training data.

2a. Select or Implement The Prediction Algorithm

Often the complexity a machine learning algorithms is in the model training, not in making predictions.

For example, making predictions with a regression algorithm is quite straightforward and easy to implement in your language of choice. This would be an example of an obvious algorithm to re-implement rather than the library used in the training of the model.

If you decide to use the library to make predictions, get familiar with the API and with the dependencies.

The software used to make predictions is just like all the other software in your application.

Treat it like software.

Implement it well, write unit tests, make it robust.

2b. Serialize Your Model Coefficients

Let’s call the numbers or structure learned by the model: coefficients.

These data are not configuration for your application.

Treat it like software configuration.

Store it in an external file with the software project. Version it. Treat configuration like code because it can just as easily break your project.

You very likely will need to update this configuration in the future as you improve your model.

3. Develop Automated Tests For Your Model

You need automated tests to prove that your model works as you expect.

In software land, we call these regression tests. They ensure the software has not regressed in its behavior in the future as we make changes to different parts of the system.

Write regression tests for your model.

  • Collect or contribute a small sample of data on which to make predictions.
  • Use the production algorithm code and configuration to make predictions.
  • Confirm the results are expected in the test.

These tests are your early warning alarm. If they fail, your model is broken and you can’t release the software or the features that use the model.

Make the tests strictly enforce the minimum performance requirements of the model.

I strongly recommend contriving test cases that you understand well, in addition to any raw datasets from the domain you want to include.

I also strongly recommend gathering outlier and interesting cases from operations over time that produce unexpected results (or break the system). These should be understood and added to the regression test suite.

Run the regression tests after each code change and before each release. Run them nightly.

4. Develop Back-Testing and Now-Testing Infrastructure

The model will change, as will the software and the data on which predictions are being made.

You want to automate the evaluation of the production model with a specified configuration on a large corpus of data.

This will allow you to efficiently back-test changes to the model on historical data and determine if you have truly made an improvement or not.

This is not the small dataset that you may use for hyperparameter tuning, this is the full suite of data available, perhaps partitioned by month, year or some other important demarcation.

  • Run the current operational model to baseline performance.
  • Run new models, competing for a place to enter operations.

Once set-up, run it nightly or weekly and have it spit out automatic reports.

Next, add a Now-Test.

This is a test of the production model on the latest data.

Perhaps it’s the data from today, this week or this month. The idea is to get an early warning that the production model may be faltering.

This can be caused by content drift, where the relationships in the data exploited by your model are subtly changing with time.

This Now-Test can also spit out reports and raise an alarm (by email) if performance drops below minimum performance requirements.

5. Challenge Then Trial Model Updates

You will need to update the model.

Maybe you devise a whole new algorithm which requires new code and new config. Revisit all of the above points.

A smaller and more manageable change would be to the model coefficients. For example, perhaps you set up a grid or random search of model hyperparameters that runs every night and spits out new candidate models.

You should do this.

Test the model and be highly critical. Give a new model every chance to slip up.

Evaluate the performance of the new model using the Back-Test and Now-Test infrastructure in Point 4 above. Review the results carefully.

Evaluate the change using the regression test, as a final automated check.

Test the features of the software that make use of the model.

Perhaps roll the change out to some locations or in a beta release for feedback, again for risk mitigation.

Accept your new model once you are satisfied that it meets the minimum performance requirements and betters prior results.

Like a ratchet, consider incrementally updating performance requirements as model performance improves.

Summary

Adding a small model to operational software is very achievable.

In this post, you discovered 5 steps to make sure you cover your bases and are following good engineering practices.

In summary, these steps were:

  1. Specify Performance Requirements.
  2. Separate Prediction Algorithm From Model Coefficients.
  3. Develop Regression Tests For Your Model.
  4. Develop Back-Testing and Now-Testing Infrastructure.
  5. Challenge Then Trial Model Updates.

If you’re interested in more information on operationalizing machine learning models check out the post:

This is more on the Google-scale machine learning model deployment. Watch the video mentioned and review the great links to both the AirBnB and Etsy production pipelines.

Do you have any questions about this post or putting your model into production?
Ask your question in the comments and I will do my best to answer.

71 Responses to Deploy Your Predictive Model To Production

  1. Avatar
    SalemAmeen October 1, 2016 at 12:19 pm #

    Many thanks

    • Avatar
      Jason Brownlee October 1, 2016 at 12:31 pm #

      I’m glad you found it useful.

      • Avatar
        vishnu prasad June 23, 2017 at 1:51 pm #

        Jason – I want to build a ecommerce streaming based recommendations. The key entities am considering are clicktsrram events like web logs to capture page hits for products. Real-time feed of product feature and category, orders in real-time.

        Outside of this am also adding few booster to business as a boost to before ranking them.

        Iam not clear conceptually, when am doing real-time on large data thro streaming does these ML algorithm will even scale or shd I go for lambda architecture which does in batches offline instead of real-time.

        Again, if i have to add something like clustering algos/PCAs for dimensionality reduction, in such high volume transactions for realtime processing – will it scale because each model would take time to execute.

        • Avatar
          Jason Brownlee June 24, 2017 at 7:58 am #

          Sorry, I do not have direct experience with streaming data, I cannot give you expert advice without doing research.

  2. Avatar
    Gabe October 3, 2016 at 1:48 am #

    Great write up. I think this topic is sorely underdocumented. Thanks!

  3. Avatar
    mirsci October 20, 2016 at 3:51 am #

    Hi Jason, thank you for all the insightful and concrete posts on M, they are always extremely helpful!

    What would it be the best approach to create test cases for ML systems in a reliable way, to be able to reveal faults and defects in the ML algorithms?

    This paper https://www.cs.upc.edu/~marias/papers/seke07.pdf captures great approaches and I am wondering if there any new techniques which you can share on this.

    Thanks!

    • Avatar
      Jason Brownlee October 20, 2016 at 8:39 am #

      Once you get a fault, those cases make excellent candidates.

      Generally, it is a good idea to get a system tester involved who can dream up evil cases.

  4. Avatar
    Pallavi January 13, 2017 at 11:24 am #

    Hi Jason,
    Thanks for a great article. I was wondering, if my model is comprised of a black box model, or an ensemble of black-box model. In this case, I do not have an easy equation to fit the model. In such case, how model implementation is handled in production?
    Thanks,
    Pallavi

    • Avatar
      Jason Brownlee January 15, 2017 at 5:14 am #

      Hi Pallavi, does not having an easy mathematical way to describe the model prevent you from using it to add value in a production environment?

      If it is a matter of risk, can the risk be mitigated?

  5. Avatar
    Jorgen June 1, 2017 at 4:20 am #

    Hi Jason,

    can you recommend any literature on the subject (books, articles) or systems for deploying ?

    Kind regards,

    Jørgen

    • Avatar
      Jason Brownlee June 2, 2017 at 12:52 pm #

      Not really, sorry. Information is very specific to your problem/business.

  6. Avatar
    TomK June 20, 2017 at 5:39 am #

    Hi,

    In order to deploy predictive models in production You can try using scoring engine – try http://scoring.one .

    Many of the mentioned features are implemented – one can deploy models from various environments.

    • Avatar
      Jason Brownlee June 20, 2017 at 6:42 am #

      Thanks for the suggestion. Have you tried it or do you work there?

  7. Avatar
    Brandon Hill October 6, 2017 at 7:52 am #

    There is definitely an emerging market of solutions to ease some of the deployment pains. TomK mentioned one. http://opendatagroup.com is another. At the moment, solutions in the space tend to focus on being model language agnostics (R, Python, Matlab, Java, C, SaS, etc.). They package up your model into an easy to deploy, scalable microservice. You can then set input and output sources for your model service to read and write from. The next facet is providing tools to monitor the performance metrics of your models, and manage the upgrading of models as new models are developed. Since many companies are still developing their data science strategy and infrastructure, I think a key point is flexibility. Look for solutions that have the flexibility to continue to connect with different data and messaging sources as your IT department continues to evolve the infrastructure.

  8. Avatar
    Hemanth October 15, 2017 at 4:49 am #

    Hi Jason,

    In order to deploy the code, how the script should be??

    Should the whole code be in a function, so that every time we can run function with required arguments?
    or
    Is there any way to write the code for such machine learning problems, as many write chunks of code for data processing, modelling, evaluation etc.

    But what will make the prediction object created works on new data????

    • Avatar
      Jason Brownlee October 15, 2017 at 5:22 am #

      These questions are specific to your project, I cannot give general answers.

  9. Avatar
    Dinesh November 23, 2017 at 11:51 pm #

    One such use case

    “How redBus uses Scikit-Learn ML models to classify customer complaints?” https://medium.com/redbus-in/how-to-deploy-scikit-learn-ml-models-d390b4b8ce7a

    • Avatar
      Jason Brownlee November 24, 2017 at 9:45 am #

      Thanks for sharing.

    • Avatar
      Narsinha February 13, 2020 at 6:44 pm #

      Really nice article, thanks for sharing!!

  10. Avatar
    Orges Leka December 1, 2017 at 5:05 pm #

    Maybe you want to take a look at https://github.com/orgesleka/webscikit. It is a webserver written in python which can hold multiple models at different urls. Models can be deployed later while the server is online. It is still work in progress, but I would be happy to hear your opinion about it.

    Kind regards
    Orges Leka

  11. Avatar
    srishti February 7, 2018 at 2:45 am #

    Hi

    Thanks for the post. I need to create an API for my model, can you please help me with how should I go about it.

    Thanks

  12. Avatar
    Aboubakar April 10, 2018 at 2:25 am #

    Thank for this topic

    I actually use weka (GUI) to create predictive model with neural networks. I don’t use java API and I don’t write code any more. My question is how can deploy the model in weka with GUI?

    Remarque : I can program in several programming langage (java,python etc) ,but I use Weka Explorer (graph user interface)

    Thanks.

    • Avatar
      Jason Brownlee April 10, 2018 at 6:23 am #

      I would guess that deploying a Weka model would require that you use the Java API.

  13. Avatar
    Nirosha April 12, 2018 at 6:35 am #

    Great article ! Thanks a lot.

  14. Avatar
    Mohammed Abdul Raoof April 27, 2018 at 10:47 pm #

    Hi Jason a you provide model code

    • Avatar
      Jason Brownlee April 28, 2018 at 5:30 am #

      The blog is full of examples of model code. You can use the search function to find it.

  15. Avatar
    Johnny May 15, 2018 at 2:04 am #

    training with keras in python should naturally lead to a python API server as a sensible choice for users making predictions. I tried with flask, with gunicorn.

    however making the server performant/work with multiple simulateneous requests is awkward.

    Multiprocessing with keras models is hard, by default they are unpicklable. if you hack them to make them picklable, they are still so large that the overhead of pickling these complex objects makes multiprocessing really slow.

    therefore you have to use multi-threading instead.
    even here you have to make some weird arbitrary seeming lines to get tensorflow to behave

    self.graph = tf.get_default_graph()
    after loading model in main thread,
    then
    self.graph.as_default()

    when using in the child threads?

    might have to and/or call model._make_predict_function() in main thread before spawning workers.

    I honestly have no idea about how/why multithreading keras models was being dodgy so who knows why one/both of the above make it work \_0_o_/

    there are multiple different github issues for it, with seemingly some different random workarounds that work for peoples specific cases.

    what you also have to note is that with multithreading you are still limited by the GIL. if you have any cpu bound work (i.e. possibly some of the input preprocessing), even if your server can now take multiple requests at a time, the GIL will still block/limit to one cpu core.
    To avoid this I try and have all pre-processing of inputs done externally to api server, so all api has to do is literally call predict and return results.

    (Im not sure whether simultaneous predicts in different threads/requests are blocked by the GIL. apparently some numpy stuff is, some isnt…not sure what that means for tensorflow predictions)

    • Avatar
      Jason Brownlee May 15, 2018 at 7:58 am #

      When the weights are fixed, the model is read-only, assuming you’re not using an RNN.

      In that case, you can use the model in parallel. If TF/Keras is giving problems and the model is small, you can use the weights to directly calculate the forward propagation and output using numpy.

  16. Avatar
    Kim Miller July 13, 2018 at 11:36 am #

    Quick Practical question: Your Scaler is saved from training and used pretty much indefinitely forward? e.g.:
    >>> from sklearn.externals import joblib
    >>> joblib.dump(clf, ‘filename.pkl’)
    >>> clf = joblib.load(‘filename.pkl’)

    What if one later does “added” training from saved weights? Should that added training use the old scaler, or should one create a new scaler whenever there is a larger amount of new training data used to update the weights?

    • Avatar
      Jason Brownlee July 14, 2018 at 6:11 am #

      Probably use the old scaler.

      If fitting the model is fast, it might be easier to re-fit from scratch.

      • Avatar
        Kim Miller July 18, 2018 at 7:29 am #

        So whenever a larger amount of training data is used for further training, feel free to create a new scaler. Got it. But probably not when “on-line” learning with daily data updates I assume.

        • Avatar
          Jason Brownlee July 18, 2018 at 2:45 pm #

          Sounds good.

          As with everything, I recommend that you codify the strategy and test it. Results are worth more than plans.

  17. Avatar
    KK September 6, 2018 at 2:20 pm #

    Hello Jason,

    In case of parametric model deployment, We do lot of data preprocessing techniques to get the best accurate model and when deploying these model’s to production how we can take care of data preprocessing steps if we only deploy co-efficient’s or approximation function?

    • Avatar
      Jason Brownlee September 7, 2018 at 8:02 am #

      Any data fed into the model MUST be prepared in the same way.

  18. Avatar
    meenal deshpande October 15, 2018 at 4:00 pm #

    Thanks for sharing.
    Good article to read.
    The away you have explained the concept its helps me to understand.
    keep posting..

  19. Avatar
    Rudra January 8, 2020 at 3:03 am #

    Thank you very much for this. I am doing my first project in this field which is a deep learning model for image classification. I am using python library for user interface. Now I want to connect them both and deploy on AWS. Any help or suggestions would be of great help. Also I don’t have any previous experience in building a software.

  20. Avatar
    Felix Lin March 14, 2020 at 2:21 am #

    Hello! I want to make an unsupervised model for multi-sensor prediction fault discrimination, to help me determine when the machine stops working, how do I learn?

  21. Avatar
    Felix Lin March 16, 2020 at 4:25 am #

    Yes, thank you very much. This is a time-series data fusion problem. Are there any related resources on data interaction?

  22. Avatar
    jose carrasquel April 12, 2020 at 8:14 am #

    I am using LSTM aproach for timeseries forecast, I trained a model, and i can to predict next days natural gas comsuption with well results, but when I thought in the deplyment, i found that I will need to retrain the models each time new data. These aproach envolves a permanent maintencae work, exist any other solution or aproach for time series forecasting? thanks in advance

  23. Avatar
    Aamir Iqbal April 14, 2020 at 1:41 am #

    Hi,

    this information is very helpful, I have confusion with one point. which is “Separation of prediction algorithm from the model coefficients”.can please explain it in detail or recommend any other useful resource to know about it in detail.

    thanks

    • Avatar
      Jason Brownlee April 14, 2020 at 6:23 am #

      Yes, there is the data of the model, like coefficients, and the algorithm that uses the model data and input data to make predictions, the “prediction algorithm”.

      Does that help? Which part is confusing?

  24. Avatar
    Haftom May 2, 2020 at 8:05 pm #

    I need a little information on how to deploy neural network model using pycharm

  25. Avatar
    Dina May 5, 2020 at 1:37 pm #

    Hi Jason, let say I want to deploy the machine learning model into production, then how about the running time in fitting the model? Lets say end user want to use the system. Does the system need t be fit again everytime before we open the system? I am confused about this matter.

  26. Avatar
    Ben July 23, 2020 at 5:28 pm #

    Hello there.

    Two things I come up with, during a production release of a model, for the last project it was a FastText model from the gensim package.

    As mentioned above in the comments, a webserver, I used flask, is useful, as the software is encapsulated and only in use for my different models. And then I got some problems pickle my model, I finally used dill (https://pypi.org/project/dill/). But I am a very newbie in this area, so this topic about production deployment is still some experimental one 🙂

  27. Avatar
    John February 8, 2021 at 10:41 pm #

    Thanks for this great article that asks really good questions about deployment to production!

    I see several comments mentioning Pickle. In my opinion it’s definitely not the best solution, both for security and interoperability reasons. Ideally it’s best to develop a quick API and serve the models through this API.
    In Python the most common choices for ML APIs are Django + Django Rest Framework, Flask, or even more recently FastAPI.

    Maybe it’s also worth noting that new cloud services are now showing up and they can serve your models for you. For example you could serve your TensorFlow models in production using Hugging Face (https://huggingface.co/) or serve your spaCy NLP models using NLP Cloud (https://nlpcloud.io).

    Developing the API is only a small part of the job. The most difficult things with machine learning in production are memory usage, cost of hosting, ensuring high availability through scalability… People should be aware of this if they want to make sure their project succeeds.

  28. Avatar
    Colm May 17, 2021 at 7:46 pm #

    FYI typo at “In software land, we all these regression test”
    Amazing site BTW.

  29. Avatar
    DWin June 24, 2022 at 5:55 am #

    Hi Jason

    Thanks for the quality content – big fan. I am busy deploying an LSTM classifier into production. During modelling, I have split the data into three sets – train (roughly 50%); validation (17%) and test (to ensure consistency and reliance, roughly 33%). This then gave me predictions for the validation and test set – of which I would only look at test predictions for performance to remove risk of data leakage.

    After hypertuning and finding the “right” coefficients, I need to deploy the model. Do I then scrap the test set (as I have ticked the box that we arent overfitting) and resplit the data, including the latest data point, into a train and validation set only? I would want predictions daily, so would I then retrain daily, and take the latest prediction out of the validation set? What would then be an appropriate train validation split – like a 99:1 or a similar 70:30 split?

    Appreciate it

    • Avatar
      James Carmichael June 24, 2022 at 8:34 am #

      Hi DWin…While I cannot speak to your project directly, one approach could be to choose 99:1 and then adjust as needed over time. Hopefully you will be able to collect data during a “testing” phase with minimum risk.

  30. Avatar
    DWin June 24, 2022 at 8:28 pm #

    Hi James, thanks for the help.

    A question to that then – would this not invalidate your modelling? Some background is it is a financial time series predictor. For a train – validation – split se, the weights found to be the best fit for the (middle) validation period (17%) would expect to change dramatically against the weights found best to fit for the new 1% (end) validation period. Note, as I drop the test set in the deployment.

    Another question would be how to deal with time series that greatly differ statistically in the various train-validation-test sets given structural changes (like we have seen in the recent past with Covid). Is there perhaps a better split method to use to ensure that each set is similar to one another without data leakage?

    Sorry for the long questions

  31. Avatar
    Nate Fastar December 11, 2022 at 5:53 am #

    I am actually in the process of deploying a predictive model to Production for the company I work at and I found this post quite enlightening as to the process. As I’ve never done anything like this before this post was a great help in both understanding concepts I had previously taken for granted and revealing gaps in my procedure that would have caused problems down the road for maintainability.

    In particular I appreciate the tips brought up in sections 1 and 2. As I’m the only Data Scientist in my organization, clearly defining expected benchmarks and baselines will not only help me make sure that the model is behaving appropriately, it will also help me to explain the models performance in more approachable terms to parts of the company with less experience in Data Science. Secondly, divorcing the algorithm from the model it produced is something that had not occurred to me, and will likely help to reduce the overall size and complexity of the program we deploy, as well as keeping any possibility of the training data leaking out or being stolen by those who could get ahold of the algorithm.

    • Avatar
      James Carmichael December 11, 2022 at 9:36 am #

      Thank you for your feedback Nate! Let us know if we can help with any questions regarding our content!

Leave a Reply