A Gentle Introduction to Serialization for Python

By Zhe Ming Chng on June 21, 2022 in Python for Machine Learning 0

Serialization refers to the process of converting a data object (e.g., Python objects, Tensorflow models) into a format that allows us to store or transmit the data and then recreate the object when needed using the reverse process of deserialization.

There are different formats for the serialization of data, such as JSON, XML, HDF5, and Python’s pickle, for different purposes. JSON, for instance, returns a human-readable string form, while Python’s pickle library can return a byte array.

In this post, you will discover how to use two common serialization libraries in Python to serialize data objects (namely pickle and HDF5) such as dictionaries and Tensorflow models in Python for storage and transmission.

After completing this tutorial, you will know:

Serialization libraries in Python such as pickle and h5py
Serializing objects such as dictionaries and Tensorflow models in Python
How to use serialization for memoization to reduce function calls

Kick-start your project with my new book Python for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started!

A Gentle Introduction to Serialization for Python. Photo by little plant. Some rights reserved

Overview

The tutorial is divided into four parts; they are:

What is serialization, and why do we serialize?
Using Python’s pickle library
Using HDF5 in Python
Comparison between different serialization methods

What Is Serialization, and Why Should We Care?

Think about storing an integer; how would you store that in a file or transmit it? That’s easy! We can just write the integer to a file and store or transmit that file.

But now, what if we think about storing a Python object (e.g., a Python dictionary or a Pandas DataFrame), which has a complex structure and many attributes (e.g., the columns and index of the DataFrame, and the data type of each column)? How would you store it as a file or transmit it to another computer?

This is where serialization comes in!

Serialization is the process of converting the object into a format that can be stored or transmitted. After transmitting or storing the serialized data, we are able to reconstruct the object later and obtain the exact same structure/object, which makes it really convenient for us to continue using the stored object later on instead of reconstructing the object from scratch.

In Python, there are many different formats for serialization available. One common example of hash maps (Python dictionaries) that works across many languages is the JSON file format which is human-readable and allows us to store the dictionary and recreate it with the same structure. But JSON can only store basic structures such as a list and dictionary, and it can only keep strings and numbers. We cannot ask JSON to remember the data type (e.g., numpy float32 vs. float64). It also cannot distinguish between Python tuples and lists.

More powerful serialization formats exist. In the following, we will explore two common serialization libraries in Python, namely pickle and h5py.

Using Python’s Pickle Library

The pickle module is part of the Python standard library and implements methods to serialize (pickling) and deserialize (unpickling) Python objects.

To get started with pickle, import it in Python:

import pickle

1	import pickle

Afterward, to serialize a Python object such as a dictionary and store the byte stream as a file, we can use pickle’s dump() method.

test_dict = {"Hello": "World!"}
with open("test.pickle", "wb") as outfile:
 	# "wb" argument opens the file in binary mode
	pickle.dump(test_dict, outfile)

test_dict = {"Hello": "World!"}

with open("test.pickle", "wb") as outfile:

# "wb" argument opens the file in binary mode

pickle.dump(test_dict, outfile)

The byte stream representing test_dict is now stored in the file “test.pickle”!

To recover the original object, we read the serialized byte stream from the file using pickle’s load() method.

with open("test.pickle", "rb") as infile:
 	test_dict_reconstructed = pickle.load(infile)

1 2	with open("test.pickle", "rb") as infile: test_dict_reconstructed = pickle.load(infile)

Warning: Only unpickle data from sources you trust, as it is possible for arbitrary malicious code to be executed during the unpickling process.

Putting them together, the following code helps you to verify that pickle can recover the same object:

import pickle

# A test object
test_dict = {"Hello": "World!"}

# Serialization
with open("test.pickle", "wb") as outfile:
    pickle.dump(test_dict, outfile)
print("Written object", test_dict)

# Deserialization
with open("test.pickle", "rb") as infile:
    test_dict_reconstructed = pickle.load(infile)
print("Reconstructed object", test_dict_reconstructed)

if test_dict == test_dict_reconstructed:
    print("Reconstruction success")

import pickle

# A test object

test_dict = {"Hello": "World!"}

# Serialization

with open("test.pickle", "wb") as outfile:

pickle.dump(test_dict, outfile)

print("Written object", test_dict)

# Deserialization

with open("test.pickle", "rb") as infile:

test_dict_reconstructed = pickle.load(infile)

print("Reconstructed object", test_dict_reconstructed)

if test_dict == test_dict_reconstructed:

print("Reconstruction success")

Besides writing the serialized object into a pickle file, we can also obtain the object serialized as a bytes-array type in Python using pickle’s dumps() function:

test_dict_ba = pickle.dumps(test_dict)      # b'\x80\x04\x95\x15…

1	test_dict_ba = pickle.dumps(test_dict) # b'\x80\x04\x95\x15…

Similarly, we can use pickle’s load method to convert from a bytes-array type back to the original object:

test_dict_reconstructed_ba = pickle.loads(test_dict_ba)

1	test_dict_reconstructed_ba = pickle.loads(test_dict_ba)

One useful thing about pickle is that it can serialize almost any Python object, including user-defined ones, such as the following:

import pickle

class NewClass:
    def __init__(self, data):
        print(data)
        self.data = data

# Create an object of NewClass
new_class = NewClass(1)

# Serialize and deserialize
pickled_data = pickle.dumps(new_class)
reconstructed = pickle.loads(pickled_data)

# Verify
print("Data from reconstructed object:", reconstructed.data)

import pickle

class NewClass:

def __init__(self, data):

print(data)

self.data = data

# Create an object of NewClass

new_class = NewClass(1)

# Serialize and deserialize

pickled_data = pickle.dumps(new_class)

reconstructed = pickle.loads(pickled_data)

# Verify

print("Data from reconstructed object:", reconstructed.data)

The code above will print the following:

1
Data from reconstructed object: 1

1 2	1 Data from reconstructed object: 1

Note that the print statement in the class’ constructor is not executed at the time pickle.loads() is invoked. This is because it reconstructed the object, not recreated it.

Pickle can even serialize Python functions since functions are first-class objects in Python:

import pickle

def test():
    return "Hello world!"

# Serialize and deserialize
pickled_function = pickle.dumps(test)
reconstructed_function = pickle.loads(pickled_function)

# Verify
print (reconstructed_function()) #prints “Hello, world!”

import pickle

def test():

return "Hello world!"

# Serialize and deserialize

pickled_function = pickle.dumps(test)

reconstructed_function = pickle.loads(pickled_function)

# Verify

print (reconstructed_function()) #prints “Hello, world!”

Therefore, we can make use of pickle to save our work. For example, a trained model from Keras or scikit-learn can be serialized by pickle and loaded later instead of re-training the model every time we use it. The following shows you how we can build a LeNet5 model to recognize the MNIST handwritten digits using Keras, then serialize the trained model using pickle. Afterward, we can reconstruct the model without training it again, and it should produce exactly the same result as the original model:

import pickle

import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Dense, AveragePooling2D, Dropout, Flatten
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping

# Load MNIST digits
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Reshape data to (n_samples, height, wiedth, n_channel)
X_train = np.expand_dims(X_train, axis=3).astype("float32")
X_test = np.expand_dims(X_test, axis=3).astype("float32")

# One-hot encode the output
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# LeNet5 model
model = Sequential([
    Conv2D(6, (5,5), input_shape=(28,28,1), padding="same", activation="tanh"),
    AveragePooling2D((2,2), strides=2),
    Conv2D(16, (5,5), activation="tanh"),
    AveragePooling2D((2,2), strides=2),
    Conv2D(120, (5,5), activation="tanh"),
    Flatten(),
    Dense(84, activation="tanh"),
    Dense(10, activation="softmax")
])

# Train the model
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
earlystopping = EarlyStopping(monitor="val_loss", patience=4, restore_best_weights=True)
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, batch_size=32, callbacks=[earlystopping])

# Evaluate the model
print(model.evaluate(X_test, y_test, verbose=0))

# Pickle to serialize and deserialize
pickled_model = pickle.dumps(model)
reconstructed = pickle.loads(pickled_model)

# Evaluate again
print(reconstructed.evaluate(X_test, y_test, verbose=0))

import pickle

import numpy as np

import tensorflow as tf

from tensorflow.keras.datasets import mnist

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Conv2D, Dense, AveragePooling2D, Dropout, Flatten

from tensorflow.keras.utils import to_categorical

from tensorflow.keras.callbacks import EarlyStopping

# Load MNIST digits

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Reshape data to (n_samples, height, wiedth, n_channel)

X_train = np.expand_dims(X_train, axis=3).astype("float32")

X_test = np.expand_dims(X_test, axis=3).astype("float32")

# One-hot encode the output

y_train = to_categorical(y_train)

y_test = to_categorical(y_test)

# LeNet5 model

model = Sequential([

Conv2D(6, (5,5), input_shape=(28,28,1), padding="same", activation="tanh"),

AveragePooling2D((2,2), strides=2),

Conv2D(16, (5,5), activation="tanh"),

AveragePooling2D((2,2), strides=2),

Conv2D(120, (5,5), activation="tanh"),

Flatten(),

Dense(84, activation="tanh"),

Dense(10, activation="softmax")

])

# Train the model

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

earlystopping = EarlyStopping(monitor="val_loss", patience=4, restore_best_weights=True)

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, batch_size=32, callbacks=[earlystopping])

# Evaluate the model

print(model.evaluate(X_test, y_test, verbose=0))

# Pickle to serialize and deserialize

pickled_model = pickle.dumps(model)

reconstructed = pickle.loads(pickled_model)

# Evaluate again

print(reconstructed.evaluate(X_test, y_test, verbose=0))

The above code will produce the output as follows. Note that the evaluation scores from the original and reconstructed models are tied out perfectly in the last two lines:

Epoch 1/100
1875/1875 [==============================] - 15s 7ms/step - loss: 0.1517 - accuracy: 0.9541 - val_loss: 0.0958 - val_accuracy: 0.9661
Epoch 2/100
1875/1875 [==============================] - 15s 8ms/step - loss: 0.0616 - accuracy: 0.9814 - val_loss: 0.0597 - val_accuracy: 0.9822
Epoch 3/100
1875/1875 [==============================] - 16s 8ms/step - loss: 0.0493 - accuracy: 0.9846 - val_loss: 0.0449 - val_accuracy: 0.9853
Epoch 4/100
1875/1875 [==============================] - 17s 9ms/step - loss: 0.0394 - accuracy: 0.9876 - val_loss: 0.0496 - val_accuracy: 0.9838
Epoch 5/100
1875/1875 [==============================] - 17s 9ms/step - loss: 0.0320 - accuracy: 0.9898 - val_loss: 0.0394 - val_accuracy: 0.9870
Epoch 6/100
1875/1875 [==============================] - 16s 9ms/step - loss: 0.0294 - accuracy: 0.9908 - val_loss: 0.0373 - val_accuracy: 0.9872
Epoch 7/100
1875/1875 [==============================] - 21s 11ms/step - loss: 0.0252 - accuracy: 0.9921 - val_loss: 0.0370 - val_accuracy: 0.9879
Epoch 8/100
1875/1875 [==============================] - 18s 10ms/step - loss: 0.0223 - accuracy: 0.9931 - val_loss: 0.0386 - val_accuracy: 0.9880
Epoch 9/100
1875/1875 [==============================] - 15s 8ms/step - loss: 0.0219 - accuracy: 0.9930 - val_loss: 0.0418 - val_accuracy: 0.9871
Epoch 10/100
1875/1875 [==============================] - 15s 8ms/step - loss: 0.0162 - accuracy: 0.9950 - val_loss: 0.0531 - val_accuracy: 0.9853
Epoch 11/100
1875/1875 [==============================] - 15s 8ms/step - loss: 0.0169 - accuracy: 0.9941 - val_loss: 0.0340 - val_accuracy: 0.9895
Epoch 12/100
1875/1875 [==============================] - 15s 8ms/step - loss: 0.0165 - accuracy: 0.9944 - val_loss: 0.0457 - val_accuracy: 0.9874
Epoch 13/100
1875/1875 [==============================] - 15s 8ms/step - loss: 0.0137 - accuracy: 0.9955 - val_loss: 0.0407 - val_accuracy: 0.9879
Epoch 14/100
1875/1875 [==============================] - 16s 8ms/step - loss: 0.0159 - accuracy: 0.9945 - val_loss: 0.0442 - val_accuracy: 0.9871
Epoch 15/100
1875/1875 [==============================] - 16s 8ms/step - loss: 0.0125 - accuracy: 0.9956 - val_loss: 0.0434 - val_accuracy: 0.9882
[0.0340442918241024, 0.9894999861717224]
[0.0340442918241024, 0.9894999861717224]

Epoch 1/100

1875/1875 [==============================] - 15s 7ms/step - loss: 0.1517 - accuracy: 0.9541 - val_loss: 0.0958 - val_accuracy: 0.9661

Epoch 2/100

1875/1875 [==============================] - 15s 8ms/step - loss: 0.0616 - accuracy: 0.9814 - val_loss: 0.0597 - val_accuracy: 0.9822

Epoch 3/100

1875/1875 [==============================] - 16s 8ms/step - loss: 0.0493 - accuracy: 0.9846 - val_loss: 0.0449 - val_accuracy: 0.9853

Epoch 4/100

1875/1875 [==============================] - 17s 9ms/step - loss: 0.0394 - accuracy: 0.9876 - val_loss: 0.0496 - val_accuracy: 0.9838

Epoch 5/100

1875/1875 [==============================] - 17s 9ms/step - loss: 0.0320 - accuracy: 0.9898 - val_loss: 0.0394 - val_accuracy: 0.9870

Epoch 6/100

1875/1875 [==============================] - 16s 9ms/step - loss: 0.0294 - accuracy: 0.9908 - val_loss: 0.0373 - val_accuracy: 0.9872

Epoch 7/100

1875/1875 [==============================] - 21s 11ms/step - loss: 0.0252 - accuracy: 0.9921 - val_loss: 0.0370 - val_accuracy: 0.9879

Epoch 8/100

1875/1875 [==============================] - 18s 10ms/step - loss: 0.0223 - accuracy: 0.9931 - val_loss: 0.0386 - val_accuracy: 0.9880

Epoch 9/100

1875/1875 [==============================] - 15s 8ms/step - loss: 0.0219 - accuracy: 0.9930 - val_loss: 0.0418 - val_accuracy: 0.9871

Epoch 10/100

1875/1875 [==============================] - 15s 8ms/step - loss: 0.0162 - accuracy: 0.9950 - val_loss: 0.0531 - val_accuracy: 0.9853

Epoch 11/100

1875/1875 [==============================] - 15s 8ms/step - loss: 0.0169 - accuracy: 0.9941 - val_loss: 0.0340 - val_accuracy: 0.9895

Epoch 12/100

1875/1875 [==============================] - 15s 8ms/step - loss: 0.0165 - accuracy: 0.9944 - val_loss: 0.0457 - val_accuracy: 0.9874

Epoch 13/100

1875/1875 [==============================] - 15s 8ms/step - loss: 0.0137 - accuracy: 0.9955 - val_loss: 0.0407 - val_accuracy: 0.9879

Epoch 14/100

1875/1875 [==============================] - 16s 8ms/step - loss: 0.0159 - accuracy: 0.9945 - val_loss: 0.0442 - val_accuracy: 0.9871

Epoch 15/100

1875/1875 [==============================] - 16s 8ms/step - loss: 0.0125 - accuracy: 0.9956 - val_loss: 0.0434 - val_accuracy: 0.9882

[0.0340442918241024, 0.9894999861717224]

While pickle is a powerful library, it still does have its own limitations to what can be pickled. For example, live connections such as database connections and opened file handles cannot be pickled. This issue arises because reconstructing these objects requires pickle to re-establish the connection with the database/file, which is something pickle cannot do for you (because it needs appropriate credentials and is out of the scope of what pickle is intended for).

Want to Get Started With Python for Machine Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Using HDF5 in Python

Hierarchical Data Format 5 (HDF5) is a binary data format. The h5py package is a Python library that provides an interface to the HDF5 format. From h5py docs, HDF5 “lets you store huge amounts of numerical data, and easily manipulate that data from Numpy.”

What HDF5 can do better than other serialization formats is store data in a file system-like hierarchy. You can store multiple objects or datasets in HDF5, like saving multiple files in the file system. You can also read a particular dataset from HDF5, like reading one file from the file system without concerning the other. If you’re using pickle for this, you will need to read and write everything each time you load or create the pickle file. Hence HDF5 is advantageous for huge amounts of data that can’t fit entirely into memory.

To get started with h5py, you first need to install the h5py library, which you can do using:

pip install h5py

1	pip install h5py

Or, if you are using a conda environment:

conda install h5py

1	conda install h5py

We can then get started with creating our first dataset!

import h5py

with h5py.File("test.hdf5", "w") as file:
    dataset = file.create_dataset("test_dataset", (100,), type="i4")

import h5py

with h5py.File("test.hdf5", "w") as file:

dataset = file.create_dataset("test_dataset", (100,), type="i4")

This creates a new dataset in the file test.hdf5 named “test_dataset,” with a shape of (100, ) and a type int32. h5py datasets follow a Numpy syntax so that you can do slicing, retrieval, get shape, etc., similar to Numpy arrays.

To retrieve a specific index:

dataset[0]  #retrieves element at index 0 of dataset

1	dataset[0] #retrieves element at index 0 of dataset

To get a slice from index 0 to index 10 of a dataset:

dataset[:10]

1	dataset[:10]

If you initialized the h5py file object outside of a with statement, remember to close the file as well!

To read from a previously created HDF5 file, you can open the file in “r” for read mode or “r+” for read/write mode:

with h5py.File("test.hdf5", "r") as file:
    print (file.keys()) #gets names of datasets that are in the file
    dataset = file["test_dataset"]

with h5py.File("test.hdf5", "r") as file:

print (file.keys()) #gets names of datasets that are in the file

dataset = file["test_dataset"]

To organize your HDF5 file, you can use groups:

with h5py.File("test.hdf5", "w") as file:
    # creates new group_1 in file
    file.create_group("group_1")
    group1 = file["group_1"]
    # creates dataset inside group1
    group1.create_dataset("dataset1", shape=(10,))
    # to access the dataset
    dataset = file["group_1"]["dataset1"]

with h5py.File("test.hdf5", "w") as file:

# creates new group_1 in file

file.create_group("group_1")

group1 = file["group_1"]

# creates dataset inside group1

group1.create_dataset("dataset1", shape=(10,))

# to access the dataset

dataset = file["group_1"]["dataset1"]

Another way to create groups and files is by specifying the path to the dataset you want to create, and h5py will create the groups on that path as well (if they don’t exist):

with h5py.File("test.hdf5", "w") as file:
    # creates dataset inside group1
    file.create_dataset("group1/dataset1", shape=(10,))

with h5py.File("test.hdf5", "w") as file:

# creates dataset inside group1

file.create_dataset("group1/dataset1", shape=(10,))

The two snippets of code both create group1 if it has not been created previously and then a dataset1 within group1.

HDF5 in Tensorflow

To save a model in Tensorflow Keras using HDF5 format, we can use the save() function of the model with a filename having extension .h5, like the following:

from tensorflow import keras

# Create model
model = keras.models.Sequential([
 	keras.layers.Input(shape=(10,)),
 	keras.layers.Dense(1)
])

model.compile(optimizer="adam", loss="mse")

# using the .h5 extension in the file name specifies that the model
# should be saved in HDF5 format
model.save("my_model.h5")

from tensorflow import keras

# Create model

model = keras.models.Sequential([

keras.layers.Input(shape=(10,)),

keras.layers.Dense(1)

])

model.compile(optimizer="adam", loss="mse")

# using the .h5 extension in the file name specifies that the model

# should be saved in HDF5 format

model.save("my_model.h5")

To load the stored HDF5 model, we can also use the function from Keras directly:

...
model = keras.models.load_model("my_model.h5")

# to check that the model has been successfully reconstructed
print(model.summary)

...

model = keras.models.load_model("my_model.h5")

# to check that the model has been successfully reconstructed

print(model.summary)

One reason we don’t want to use pickle for a Keras model is that we need a more flexible format that does not tie to a particular version of Keras. If we upgraded our Tensorflow version, the model object might change, and pickle may fail to give us a working model. Another reason is to keep only the essential data for our model. For example, if we check the HDF5 file my_model.h5 created in the above, we see these are stored:

/
/model_weights
/model_weights/dense
/model_weights/dense/dense
/model_weights/dense/dense/bias:0
/model_weights/dense/dense/kernel:0
/model_weights/top_level_model_weights

/model_weights

/model_weights/dense

/model_weights/dense/dense

/model_weights/dense/dense/bias:0

/model_weights/dense/dense/kernel:0

/model_weights/top_level_model_weights

Hence Keras selected only the data that are essential to reconstruct the model. A trained model will contain more datasets, namely, there are /optimizer_weights/ besides /model_weights/. Keras will reconstruct the model and restore the weights appropriately to give us a model that functions the same.

Take the example above, for example. We have our model saved in my_model.h5. Our model is a single dense layer, and we can dig out the kernel of the layer by the following:

import h5py

with h5py.File("my_model.h5", "r") as infile:
    print(infile["/model_weights/dense/dense/kernel:0"][:])

import h5py

with h5py.File("my_model.h5", "r") as infile:

print(infile["/model_weights/dense/dense/kernel:0"][:])

As we didn’t train our network for anything, it will give us the random matrix that initialized the layer:

[[ 0.6872471 ]
 [-0.51016176]
 [-0.5604881 ]
 [ 0.3387223 ]
 [ 0.52146655]
 [-0.6960067 ]
 [ 0.38258582]
 [-0.05564564]
 [ 0.1450575 ]
 [-0.3391946 ]]

[[ 0.6872471 ]

[-0.51016176]

[-0.5604881 ]

[ 0.3387223 ]

[ 0.52146655]

[-0.6960067 ]

[ 0.38258582]

[-0.05564564]

[ 0.1450575 ]

[-0.3391946 ]]

And in HDF5, the metadata is stored alongside the data. Keras stored the network’s architecture in a JSON format in the metadata. Hence we can reproduce our network architecture as follows:

import json
import h5py

with h5py.File("my_model.h5", "r") as infile:
    for key in infile.attrs.keys():
        formatted = infile.attrs[key]
        if key.endswith("_config"):
            formatted = json.dumps(json.loads(formatted), indent=4)
        print(f"{key}: {formatted}")

import json

import h5py

with h5py.File("my_model.h5", "r") as infile:

for key in infile.attrs.keys():

formatted = infile.attrs[key]

if key.endswith("_config"):

formatted = json.dumps(json.loads(formatted), indent=4)

print(f"{key}: {formatted}")

This produces:

backend: tensorflow
keras_version: 2.7.0
model_config: {
    "class_name": "Sequential",
    "config": {
        "name": "sequential",
        "layers": [
            {
                "class_name": "InputLayer",
                "config": {
                    "batch_input_shape": [
                        null,
                        10
                    ],
                    "dtype": "float32",
                    "sparse": false,
                    "ragged": false,
                    "name": "input_1"
                }
            },
            {
                "class_name": "Dense",
                "config": {
                    "name": "dense",
                    "trainable": true,
                    "dtype": "float32",
                    "units": 1,
                    "activation": "linear",
                    "use_bias": true,
                    "kernel_initializer": {
                        "class_name": "GlorotUniform",
                        "config": {
                            "seed": null
                        }
                    },
                    "bias_initializer": {
                        "class_name": "Zeros",
                        "config": {}
                    },
                    "kernel_regularizer": null,
                    "bias_regularizer": null,
                    "activity_regularizer": null,
                    "kernel_constraint": null,
                    "bias_constraint": null
                }
            }
        ]
    }
}
training_config: {
    "loss": "mse",
    "metrics": null,
    "weighted_metrics": null,
    "loss_weights": null,
    "optimizer_config": {
        "class_name": "Adam",
        "config": {
            "name": "Adam",
            "learning_rate": 0.001,
            "decay": 0.0,
            "beta_1": 0.9,
            "beta_2": 0.999,
            "epsilon": 1e-07,
            "amsgrad": false
        }
    }
}

backend: tensorflow

keras_version: 2.7.0

model_config: {

"class_name": "Sequential",

"config": {

"name": "sequential",

"layers": [

{

"class_name": "InputLayer",

"config": {

"batch_input_shape": [

null,

"dtype": "float32",

"sparse": false,

"ragged": false,

"name": "input_1"

}

{

"class_name": "Dense",

"config": {

"name": "dense",

"trainable": true,

"dtype": "float32",

"units": 1,

"activation": "linear",

"use_bias": true,

"kernel_initializer": {

"class_name": "GlorotUniform",

"config": {

"seed": null

}

"bias_initializer": {

"class_name": "Zeros",

"config": {}

"kernel_regularizer": null,

"bias_regularizer": null,

"activity_regularizer": null,

"kernel_constraint": null,

"bias_constraint": null

}

]

}

training_config: {

"loss": "mse",

"metrics": null,

"weighted_metrics": null,

"loss_weights": null,

"optimizer_config": {

"class_name": "Adam",

"config": {

"name": "Adam",

"learning_rate": 0.001,

"decay": 0.0,

"beta_1": 0.9,

"beta_2": 0.999,

"epsilon": 1e-07,

"amsgrad": false

}

The model config (i.e., the architecture of our neural network) and training config (i.e., the parameters we passed into the compile() function) are stored as a JSON string. In the code above, we use the json module to reformat it to make it easier to read. It is recommended to save your model as HDF5 rather than just your Python code because, as we can see above, it contains more detailed information than the code on how the network was constructed.

Comparing Between Different Serialization Methods

In the above, we saw how pickle and h5py can help serialize our Python data.

We can use pickle to serialize almost any Python object, including user-defined ones and functions. But pickle is not language agnostic. You cannot unpickle it outside Python. There are even 6 versions of pickle developed so far, and older Python may not be able to consume the newer version of pickle data.

On the contrary, HDF5 is cross-platform and works well with other language such as Java and C++. In Python, the h5py library implemented the Numpy interface to make it easier to manipulate the data. The data can be accessed in a different language because the HDF5 format supports only the Numpy data types such as float and strings. We cannot store arbitrary objects such as a Python function into HDF5.

Summary

In this post, you discovered what serialization is and how to use libraries in Python to serialize Python objects such as dictionaries and Tensorflow Keras models. You have also learned the advantages and disadvantages of two Python libraries for serialization (pickle, h5py).

Specifically, you learned:

what is serialization, and why it is useful
how to get started with pickle and h5py serialization libraries in Python
pros and cons of different serialization methods

Navigation

A Gentle Introduction to Serialization for Python

Overview

What Is Serialization, and Why Should We Care?

Using Python’s Pickle Library

Want to Get Started With Python for Machine Learning?

Using HDF5 in Python

HDF5 in Tensorflow

Comparing Between Different Serialization Methods

Further Reading

Articles

Libraries

APIs

Summary

Get a Handle on Python for Machine Learning!

Be More Confident to Code in Python

Showing You the Python Toolbox at a High Level for
Your Projects

More On This Topic

No comments yet.

Leave a Reply Click here to cancel reply.

Navigation

Overview

What Is Serialization, and Why Should We Care?

Using Python’s Pickle Library

Want to Get Started With Python for Machine Learning?

Using HDF5 in Python

HDF5 in Tensorflow

Comparing Between Different Serialization Methods

Further Reading

Articles

Libraries

APIs

Summary

Get a Handle on Python for Machine Learning!

Be More Confident to Code in Python

Showing You the Python Toolbox at a High Level for Your Projects

More On This Topic

No comments yet.

Leave a Reply Click here to cancel reply.

Showing You the Python Toolbox at a High Level for
Your Projects