A First Course on Deploying Python Projects

By Adrian Tam on June 21, 2022 in Python for Machine Learning 0

After all the hard work developing a project in Python, we want to share our project with other people. It can be your friends or your colleagues. Maybe they are not interested in your code, but they want to run it and make some real use of it. For example, you create a regression model that can predict a value based on input features. Your friend wants to provide their own feature and see what value your model predicts. But as your Python project gets larger, it is not as simple as sending your friend a small script. There can be many supporting files, multiple scripts, and also dependencies on a list of libraries. Getting all these right can be a challenge.

After finishing this tutorial, you will learn:

How to harden your code for easier deployment by making it a module
How to create a package for your module so we can rely on pip to manage the dependencies
How to use a venv module to create reproducible running environments

Kick-start your project with my new book Python for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started!

A First Course on Deploying Python Projects
Photo by Kelly L. Some rights reserved.

Overview

This tutorial is divided into four parts; they are:

From development to deployment
Creating modules
From module to package
Using venv for your project

From Development to Deployment

When we finish a project in Python, occasionally, we don’t want to shelve it but want to make it a routine job. We may finish training a machine learning model and actively use the trained model for prediction. We may build a time series model and use it for next-step prediction. However, new data comes in every day, so we need to re-train it to adapt to the development and keep future predictions accurate.

Whatever the reason, we need to make sure the program will run as expected. However, this can be harder than we thought. A simple Python script may not be a difficult issue, but as our program gets larger with more dependencies, many things can go wrong. For example, a newer version of a library that we used can break the workflow. Or our Python script might run some external program, and that may cease to work after an upgrade of our OS. Another case is when the program depends on some files located at a specific path, but we may accidentally delete or rename a file.

There is always a way for our program to fail to execute. But we have some techniques to make it more robust and more reliable.

Creating Modules

In a previous post, we demonstrated that we could check a code snippet’s time to finish with the following command:

python -m timeit -s 'import numpy as np' 'np.random.random()'

1	python -m timeit -s 'import numpy as np' 'np.random.random()'

At the same time, we can also use it as part of a script and do the following:

import timeit
import numpy as np

time = timeit.timeit("np.random.random()", globals=globals())
print(time)

import timeit

import numpy as np

time = timeit.timeit("np.random.random()", globals=globals())

print(time)

The import statement in Python allows you to reuse functions defined in another file by considering it as a module. You may wonder how we can make a module not only provide functions but also become an executable program. This is the first step to help deploy our code. If we can make our module executable, the users would not need to understand how our code is structured to use it.

If our program is large enough to have multiple files, it is better to package it as a module. A module in Python is usually a folder of Python scripts with a clear entry point. Hence it is more convenient to send to other people and easier to understand the flow. Moreover, we can add versions to the module and let pip keep track of the version installed.

A simple, single file program can be written as follows:

import random

def main():
    n = random.random()
    print(n)
    
if __name__ == "__main__":
    main()

import random

def main():

n = random.random()

print(n)

if __name__ == "__main__":

main()

If we save this as randomsample.py in the local directory, we can either run it with:

python randomsample.py

1	python randomsample.py

or:

python -m randomsample

1	python -m randomsample

And we can reuse the functions in another script with:

import randomsample

randomsample.main()

import randomsample

randomsample.main()

This works because the magic variable __name__ will be "__main__" only if the script is run as the main program but not when imported from another script. With this, your machine learning project can probably be packaged as the following:

regressor/
    __init__.py
    data.json
    model.pickle
    predict.py
    train.py

regressor/

__init__.py

data.json

model.pickle

predict.py

train.py

Now, regressor is a directory with those five files in it. And __init__.py is an empty file, just to signal that this directory is a Python module that you can import. The script train.py is as follows:

import os
import json
import pickle
from sklearn.linear_model import LinearRegression

def load_data():
    current_dir = os.path.dirname(os.path.realpath(__file__))
    filepath = os.path.join(current_dir, "data.json")
    data = json.load(open(filepath))
    return data

def train():
    reg = LinearRegression()
    data = load_data()
    reg.fit(data["data"], data["target"])
    return reg

import os

import json

import pickle

from sklearn.linear_model import LinearRegression

def load_data():

current_dir = os.path.dirname(os.path.realpath(__file__))

filepath = os.path.join(current_dir, "data.json")

data = json.load(open(filepath))

return data

def train():

reg = LinearRegression()

data = load_data()

reg.fit(data["data"], data["target"])

return reg

The script for predict.py is:

import os
import pickle
import sys
import numpy as np

def predict(features):
    current_dir = os.path.dirname(os.path.realpath(__file__))
    filepath = os.path.join(current_dir, "model.pickle")
    with open(filepath, "rb") as fp:
        reg = pickle.load(fp)
    return reg.predict(features)

if __name__ == "__main__":
    arr = np.asarray(sys.argv[1:]).astype(float).reshape(1,-1)
    y = predict(arr)
    print(y[0])

import os

import pickle

import sys

import numpy as np

def predict(features):

current_dir = os.path.dirname(os.path.realpath(__file__))

filepath = os.path.join(current_dir, "model.pickle")

with open(filepath, "rb") as fp:

reg = pickle.load(fp)

return reg.predict(features)

if __name__ == "__main__":

arr = np.asarray(sys.argv[1:]).astype(float).reshape(1,-1)

y = predict(arr)

print(y[0])

Then, we can run the following under the parent directory of regressor/ to load the data and train a linear regression model. Then we can save the model with pickle:

import pickle
from regressor.train import train

model = train()
with open("model.pickle", "wb") as fp:
    pickle.save(model, fp)

import pickle

from regressor.train import train

model = train()

with open("model.pickle", "wb") as fp:

pickle.save(model, fp)

If we move this pickle file into the regressor/ directory, we can also do the following in a command line to run the model:

python -m regressor.predict 0.186 0 8.3 0 0.62 6.2 58 1.96 6 400 18.1 410 11.5

1	python -m regressor.predict 0.186 0 8.3 0 0.62 6.2 58 1.96 6 400 18.1 410 11.5

Here the numerical arguments are a vector of input features to the model. If we further move out the if block, namely, create a file regressor/__main__.py with the following code:

import sys
import numpy as np
from .predict import predict

if __name__ == "__main__":
    arr = np.asarray(sys.argv[1:]).astype(float).reshape(1,-1)
    y = predict(arr)
    print(y[0])

import sys

import numpy as np

from .predict import predict

if __name__ == "__main__":

arr = np.asarray(sys.argv[1:]).astype(float).reshape(1,-1)

y = predict(arr)

print(y[0])

Then we can run the model directly from the module:

python -m regressor 0.186 0 8.3 0 0.62 6.2 58 1.96 6 400 18.1 410 11.5

1	python -m regressor 0.186 0 8.3 0 0.62 6.2 58 1.96 6 400 18.1 410 11.5

Note the line form .predict import predict in the example above uses Python’s relative import syntax. This should be used inside a module to import components from other scripts of the same module.

Want to Get Started With Python for Machine Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

From Module to Package

If you want to distribute your Python project as a final product, it is convenient to be able to install your project as a package with the pip install command. This can be done easily. As you already created a module from your project, what you need to supplement is some simple setup instructions. Now you need to create a project directory and put your module in it with a pyproject.toml file, a setup.cfg file, and a MANIFEST.in file. The file structure would be like this:

project/
    pyproject.toml
    setup.cfg
    MANIFEST.in
    regressor/
        __init__.py
        data.json
        model.pickle
        predict.py
        train.py

project/

pyproject.toml

setup.cfg

MANIFEST.in

regressor/

__init__.py

data.json

model.pickle

predict.py

train.py

We will use setuptools as it has become a standard for this task. The file pyproject.toml is to specify setuptools:

[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[build-system]

requires = ["setuptools"]

build-backend = "setuptools.build_meta"

The key information is provided in setup.cfg. We need to specify the name of the module, the version, some optional description, what to include, and what to depend on, such as the following:

[metadata]
name = mlm_demo
version = 0.0.1
description = a simple linear regression model

[options]
packages = regressor
include_package_data = True
python_requires = >=3.6
install_requires =
    scikit-learn==1.0.2
    numpy>=1.22, <1.23
    h5py

[metadata]

name = mlm_demo

version = 0.0.1

description = a simple linear regression model

[options]

packages = regressor

include_package_data = True

python_requires = >=3.6

install_requires =

scikit-learn==1.0.2

numpy>=1.22, <1.23

h5py

The MANIFEST.in is just to specify what extra file we need to include. In projects that do not have a non-Python script included, this file can be omitted. But in our case, we need to include the trained model and the data file:

include regressor/data.json
include regressor/model.pickle

1 2	include regressor/data.json include regressor/model.pickle

Then in the project directory, we can install it as a module into our Python system with the following command:

pip install .

1	pip install .

Afterward, the following code works anywhere as regressor is a module accessible in our Python installation:

import numpy as np
from regressor.predict import predict

X = np.asarray([[0.186,0,8.3,0,0.62,6.2,58,1.96,6,400,18.1,410,11.5]])
y = predict(X)
print(y[0])

import numpy as np

from regressor.predict import predict

X = np.asarray([[0.186,0,8.3,0,0.62,6.2,58,1.96,6,400,18.1,410,11.5]])

y = predict(X)

print(y[0])

There are a few details worth explaining in the setup.cfg: The metadata section is for the pip system. Hence we named our package mlm_demo, which you can see in the output of the pip list command. However, Python’s module system will recognize the module name as regressor as specified in the options section. Therefore, this is the name you should use in the import statement. Often, these two names are the same for the convenience of the users, and that’s why people use the names “package” and “module” interchangeably. Similarly, version 0.0.1 appears in pip but is not known from the code. It is a convention to put this in __init__.py in the module directory, so you can check the version in another script that uses it:

__version__ = '0.0.1'

1	__version__ = '0.0.1'

The install_requires part in the options section is the key to making our project run. It means that when we install this module, we also need to install those other modules at those versions (if specified). This may create a tree of dependencies, but pip will take care of it when you run the pip install command. As you can expect, we are using Python’s comparison operator == for a specific version. But if we can accept multiple versions, we use a comma (,) to separate the conditions, such as in the case of numpy above.

Now you can ship the entire project directory to other people (e.g., in a ZIP file). They can install it with pip install in the project directory and then run your code with python -m regressor given the appropriate command line argument provided.

A final note: Perhaps you heard of the requirements.txt file in a Python project. It is just a text file, usually placed in a directory with a Python module or some Python scripts. It has a format similar to the dependency specification mentioned above. For example, it may look like this:

scikit-learn==1.0.2
numpy>=1.22, <1.23
h5py

scikit-learn==1.0.2

numpy>=1.22, <1.23

h5py

What is aimed for is that you do not want to make your project into a package but still want to give hints on the libraries and their versions that your project expects. This file can be understood by pip, and we can make it set up our system to prepare for the project:

pip install -r requirements.txt

1	pip install -r requirements.txt

But this is just for a project in development, and that’s all the convenience the requirements.txt can provide.

Using venv for Your Project

The above is probably the most efficient way to ship and deploy a project since you include only the most essential files. This is also the recommended way because it is platform-agnostic. This still works if we change our Python version or move to a different OS (unless some specific dependency forbids us).

But there are cases where we may want to reproduce an exact environment for our project to run. For example, instead of requiring some packages installed, we want some that must not be installed. Also, there are cases where after we installed a package with pip, the version dependency breaks after another package is installed. We can solve this problem with the venv module in Python.

The venv module is from Python’s standard library to allow us to create a virtual environment. It is not a virtual machine or virtualization like Docker can provide; instead, it heavily modifies the path location that Python operates. For example, we can install multiple versions of Python in our OS, but a virtual environment always assumes the python command means a particular version. Another example is that within one virtual environment, we can run pip install to set up some packages in a virtual environment directory that will not interfere with the system outside.

To start with venv, we can simply find a good location and run the command:

$ python -m venv myproject

1	$ python -m venv myproject

Then there will be a directory named myproject created. A virtual environment is supposed to operate in a shell (so the environment variables can be manipulated). To activate a virtual environment, we execute the activation shell script with the following command (e.g., under bash or zsh in Linux and macOS):

$ source myproject/bin/activate

1	$ source myproject/bin/activate

And afterward, you’re under the Python virtual environment. The command python will be the command you created in the virtual environment (in case you have multiple Python versions installed in your OS). And the packages installed will be located under myproject/lib/python3.9/site-packages (assuming Python 3.9). When you run pip install or pip list, you only see the packages under the virtual environment.

To leave the virtual environment, we run deactivate in the shell command line:

$ deactivate

1	$ deactivate

This is defined as a shell function.

Using virtual environments could be particularly useful if you have multiple projects in development and they require different versions of packages (such as different versions of TensorFlow). You can simply create a virtual environment, activate it, install the correct versions of all the libraries you need using the pip install command, then put your project code inside the virtual environment. Your virtual environment directory can be huge in size (e.g., just installing TensorFlow with its dependencies will consume almost 1GB of disk space). But afterward, shipping the entire virtual environment directory to others can guarantee the exact environment to execute your code. This can be an alternative to the Docker container if you prefer not to run the Docker server.

Summary

In this tutorial, you’ve seen how we can confidently wrap up our project and deliver it to another user to run it. Specifically, you learned:

The minimal change to a folder of Python scripts to make them a module
How to convert a module into a package for pip
What is a virtual environment in Python, and how to use it

Navigation

A First Course on Deploying Python Projects

Overview

From Development to Deployment

Creating Modules

Want to Get Started With Python for Machine Learning?

From Module to Package

Using venv for Your Project

Further Reading

Articles

APIs and software

Summary

Get a Handle on Python for Machine Learning!

Be More Confident to Code in Python

Showing You the Python Toolbox at a High Level for
Your Projects

More On This Topic

No comments yet.

Leave a Reply Click here to cancel reply.

Navigation

Overview

From Development to Deployment

Creating Modules

Want to Get Started With Python for Machine Learning?

From Module to Package

Using venv for Your Project

Further Reading

Articles

APIs and software

Summary

Get a Handle on Python for Machine Learning!

Be More Confident to Code in Python

Showing You the Python Toolbox at a High Level for Your Projects

More On This Topic

No comments yet.

Leave a Reply Click here to cancel reply.

Showing You the Python Toolbox at a High Level for
Your Projects