Deploy Your Predictive Model To Production

5 Best Practices For Operationalizing Machine Learning.

Not all predictive models are at Google-scale.

Sometimes you develop a small predictive model that you want to put in your software.

I recently received this reader question:

Actually, there is a part that is missing in my knowledge about machine learning. All tutorials give you the steps up until you build your machine learning model. How could you use this model?

In this post, we look at some best practices to ease the transition of your model into production and ensure that you get the most out of it.

How To Deploy Your Predictive Model To Production

How To Deploy Your Predictive Model To Production
Photo by reynermedia, some rights reserved.

I Have a Model. Now What?

So you have been through a systematic process and created a reliable and accurate model that can make predictions for your problem.

You want to use this model somehow.

  • Maybe you want to create a standalone program that can make ad hoc predictions.
  • Maybe you want to incorporate the model into your existing software.

Let’s assume that your software is modest. You are not looking for Google-sized scale deployment. Maybe it’s just for you, maybe just a client or maybe for a few workstations.

So far so good?

Now we need to look at some best practices to put your accurate and reliable model into operations.

5 Model Deployment Best Practices

Why not just slap the model into your software and release?

You could. But by adding a few additional steps you can build confidence that the model that you’re deploying is maintainable and remains accurate over the long term.

Have you put a model into production?
Please leave a comment and share your experiences.

Below a five best practice steps that you can take when deploying your predictive model into production.

1. Specify Performance Requirements

You need to clearly spell out what constitutes good and bad performance.

This maybe as accuracy or false positives or whatever metrics are important to the business.

Spell out, and use the current model you have developed as the baseline numbers.

These numbers may be increased over time as you improve the system.

Performance requires are important. Without them, you will not be able to setup the tests you will need to determine if the system is behaving as expected.

Do not proceed until you have agreed upon minimum, mean or a performance range expectation.

2. Separate Prediction Algorithm From Model Coefficients

You may have used a library to create your predictive model. For example, R, scikit-learn or Weka.

You can choose to deploy your model using that library or re-implement the predictive aspect of the model in your software. You may even want to setup your model as a web service.

Regardless, it is good practice to separate the algorithm that makes predictions from the model internals. That is the specific coefficients or structure within the model learned from your training data.

2a. Select or Implement The Prediction Algorithm

Often the complexity a machine learning algorithms is in the model training, not in making predictions.

For example, making predictions with a regression algorithm is quite straightforward and easy to implement in your language of choice. This would be an example of an obvious algorithm to re-implement rather than the library used in the training of the model.

If you decide to use the library to make predictions, get familiar with the API and with the dependencies.

The software used to make predictions is just like all the other software in your application.

Treat it like software.

Implement it well, write unit tests, make it robust.

2b. Serialize Your Model Coefficients

Let’s call the numbers or structure learned by the model: coefficients.

These data are not configuration for your application.

Treat it like software configuration.

Store it in an external file with the software project. Version it. Treat configuration like code because it can just as easily break your project.

You very likely will need to update this configuration in the future as you improve your model.

3. Develop Automated Tests For Your Model

You need automated tests to prove that your model works as you expect.

In software land, we all these regression tests. They ensure the software has not regressed in its behavior in the future as we make changes to different parts of the system.

Write regression tests for your model.

  • Collect or contribute a small sample of data on which to make predictions.
  • Use the production algorithm code and configuration to make predictions.
  • Confirm the results are expected in the test.

These tests are your early warning alarm. If they fail, your model is broken and you can’t release the software or the features that use the model.

Make the tests strictly enforce the minimum performance requirements of the model.

I strongly recommend contriving test cases that you understand well, in addition to any raw datasets from the domain you want to include.

I also strongly recommend gathering outlier and interesting cases from operations over time that produce unexpected results (or break the system). These should be understood and added to the regression test suite.

Run the regression tests after each code change and before each release. Run them nightly.

4. Develop Back-Testing and Now-Testing Infrastructure

The model will change, as will the software and the data on which predictions are being made.

You want to automate the evaluation of the production model with a specified configuration on a large corpus of data.

This will allow you to efficiently back-test changes to the model on historical data and determine if you have truly made an improvement or not.

This is not the small dataset that you may use for hyperparameter tuning, this is the full suite of data available, perhaps partitioned by month, year or some other important demarcation.

  • Run the current operational model to baseline performance.
  • Run new models, competing for a place to enter operations.

Once set-up, run it nightly or weekly and have it spit out automatic reports.

Next, add a Now-Test.

This is a test of the production model on the latest data.

Perhaps it’s the data from today, this week or this month. The idea is to get an early warning that the production model may be faltering.

This can be caused by content drift, where the relationships in the data exploited by your model are subtly changing with time.

This Now-Test can also spit out reports and raise an alarm (by email) if performance drops below minimum performance requirements.

5. Challenge Then Trial Model Updates

You will need to update the model.

Maybe you devise a whole new algorithm which requires new code and new config. Revisit all of the above points.

A smaller and more manageable change would be to the model coefficients. For example, perhaps you set up a grid or random search of model hyperparameters that runs every night and spits out new candidate models.

You should do this.

Test the model and be highly critical. Give a new model every chance to slip up.

Evaluate the performance of the new model using the Back-Test and Now-Test infrastructure in Point 4 above. Review the results carefully.

Evaluate the change using the regression test, as a final automated check.

Test the features of the software that make use of the model.

Perhaps roll the change out to some locations or in a beta release for feedback, again for risk mitigation.

Accept your new model once you are satisfied that it meets the minimum performance requirements and betters prior results.

Like a ratchet, consider incrementally updating performance requirements as model performance improves.

Summary

Adding a small model to operational software is very achievable.

In this post, you discovered 5 steps to make sure you cover your bases and are following good engineering practices.

In summary, these steps were:

  1. Specify Performance Requirements.
  2. Separate Prediction Algorithm From Model Coefficients.
  3. Develop Regression Tests For Your Model.
  4. Develop Back-Testing and Now-Testing Infrastructure.
  5. Challenge Then Trial Model Updates.

If you’re interested in more information on operationalizing machine learning models check out the post:

This is more on the Google-scale machine learning model deployment. Watch the video mentioned and review the great links to both the AirBnB and Etsy production pipelines.

Do you have any questions about this post or putting your model into production?
Ask your question in the comments and I will do my best to answer.

18 Responses to Deploy Your Predictive Model To Production

  1. SalemAmeen October 1, 2016 at 12:19 pm #

    Many thanks

    • Jason Brownlee October 1, 2016 at 12:31 pm #

      I’m glad you found it useful.

      • vishnu prasad June 23, 2017 at 1:51 pm #

        Jason – I want to build a ecommerce streaming based recommendations. The key entities am considering are clicktsrram events like web logs to capture page hits for products. Real-time feed of product feature and category, orders in real-time.

        Outside of this am also adding few booster to business as a boost to before ranking them.

        Iam not clear conceptually, when am doing real-time on large data thro streaming does these ML algorithm will even scale or shd I go for lambda architecture which does in batches offline instead of real-time.

        Again, if i have to add something like clustering algos/PCAs for dimensionality reduction, in such high volume transactions for realtime processing – will it scale because each model would take time to execute.

        • Jason Brownlee June 24, 2017 at 7:58 am #

          Sorry, I do not have direct experience with streaming data, I cannot give you expert advice without doing research.

  2. Gabe October 3, 2016 at 1:48 am #

    Great write up. I think this topic is sorely underdocumented. Thanks!

  3. mirsci October 20, 2016 at 3:51 am #

    Hi Jason, thank you for all the insightful and concrete posts on M, they are always extremely helpful!

    What would it be the best approach to create test cases for ML systems in a reliable way, to be able to reveal faults and defects in the ML algorithms?

    This paper https://www.cs.upc.edu/~marias/papers/seke07.pdf captures great approaches and I am wondering if there any new techniques which you can share on this.

    Thanks!

    • Jason Brownlee October 20, 2016 at 8:39 am #

      Once you get a fault, those cases make excellent candidates.

      Generally, it is a good idea to get a system tester involved who can dream up evil cases.

  4. Pallavi January 13, 2017 at 11:24 am #

    Hi Jason,
    Thanks for a great article. I was wondering, if my model is comprised of a black box model, or an ensemble of black-box model. In this case, I do not have an easy equation to fit the model. In such case, how model implementation is handled in production?
    Thanks,
    Pallavi

    • Jason Brownlee January 15, 2017 at 5:14 am #

      Hi Pallavi, does not having an easy mathematical way to describe the model prevent you from using it to add value in a production environment?

      If it is a matter of risk, can the risk be mitigated?

  5. Jorgen June 1, 2017 at 4:20 am #

    Hi Jason,

    can you recommend any literature on the subject (books, articles) or systems for deploying ?

    Kind regards,

    Jørgen

    • Jason Brownlee June 2, 2017 at 12:52 pm #

      Not really, sorry. Information is very specific to your problem/business.

  6. TomK June 20, 2017 at 5:39 am #

    Hi,

    In order to deploy predictive models in production You can try using scoring engine – try http://scoring.one .

    Many of the mentioned features are implemented – one can deploy models from various environments.

    • Jason Brownlee June 20, 2017 at 6:42 am #

      Thanks for the suggestion. Have you tried it or do you work there?

  7. Brandon Hill October 6, 2017 at 7:52 am #

    There is definitely an emerging market of solutions to ease some of the deployment pains. TomK mentioned one. http://opendatagroup.com is another. At the moment, solutions in the space tend to focus on being model language agnostics (R, Python, Matlab, Java, C, SaS, etc.). They package up your model into an easy to deploy, scalable microservice. You can then set input and output sources for your model service to read and write from. The next facet is providing tools to monitor the performance metrics of your models, and manage the upgrading of models as new models are developed. Since many companies are still developing their data science strategy and infrastructure, I think a key point is flexibility. Look for solutions that have the flexibility to continue to connect with different data and messaging sources as your IT department continues to evolve the infrastructure.

  8. Hemanth October 15, 2017 at 4:49 am #

    Hi Jason,

    In order to deploy the code, how the script should be??

    Should the whole code be in a function, so that every time we can run function with required arguments?
    or
    Is there any way to write the code for such machine learning problems, as many write chunks of code for data processing, modelling, evaluation etc.

    But what will make the prediction object created works on new data????

    • Jason Brownlee October 15, 2017 at 5:22 am #

      These questions are specific to your project, I cannot give general answers.

Leave a Reply