Overview of Some Deep Learning Libraries

By Adrian Tam on July 27, 2022 in Deep Learning 5

Machine learning is a broad topic. Deep learning, in particular, is a way of using neural networks for machine learning. A neural network is probably a concept older than machine learning, dating back to the 1950s. Unsurprisingly, there were many libraries created for it.

The following aims to give an overview of some of the famous libraries for neural networks and deep learning.

After finishing this tutorial, you will learn:

Some of the deep learning or neural network libraries
The functional difference between two common libraries, PyTorch and TensorFlow

Let’s get started.

Overview of some deep learning libraries
Photo by Francesco Ungaro. Some rights reserved.

Overview

This tutorial is in three parts; they are:

The C++ Libraries
Python Libraries
PyTorch and TensorFlow

The C++ Libraries

Deep learning has gained attention in the last decade. Before that, there was little confidence in how to train a neural network with many layers. However, understanding how to build a multilayer perceptron was around for many years.

Before we had deep learning, probably the most famous neural network library was libann. It is a library for C++, and the functionality is limited due to its age. This library has since stopped development. A newer library for C++ is OpenNN, which allows modern C++ syntax.

But that’s pretty much all for C++. The rigid syntax of C++ may be why we do not have too many libraries for deep learning. The training phase of a deep learning project is about experiments. We want some tools that allow us to iterate faster. Hence a dynamic programming language could be a better fit. Therefore, you see Python come on the scene.

Python Libraries

One of the earliest libraries for deep learning is Caffe. It was developed at U.C. Berkeley specifically for computer vision problems. While it is developed in C++, it serves as a library with a Python interface. Hence we can build our project in Python with the network defined in a JSON-like syntax.

Chainer is another library in Python. It is an influential one because the syntax makes a lot of sense. While it is less common nowadays, the API in Keras and PyTorch bears a resemblance to Chainer. The following is an example from Chainer’s documentation, and you may mistake it as Keras or PyTorch:

import chainer
import chainer.functions as F
import chainer.links as L
from chainer import iterators, optimizer, training, Chain
from chainer.datasets import mnist

train, test = mnist.get_mnist()
batchsize = 128
max_epoch = 10

train_iter = iterators.SerialIterator(train, batchsize)

class MLP(Chain):
    def __init__(self, n_mid_units=100, n_out=10):
        super(MLP, self).__init__()
        with self.init_scope():
            self.l1 = L.Linear(None, n_mid_units)
            self.l2 = L.Linear(None, n_mid_units)
            self.l3 = L.Linear(None, n_out)

    def forward(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        return self.l3(h2)

# create model
model = MLP()
model = L.Classifier(model)  # using softmax cross entropy

# set up optimizer
optimizer = optimizers.MomentumSGD()
optimizer.setup(model)

# connect train iterator and optimizer to an updater
updater = training.updaters.StandardUpdater(train_iter, optimizer)

# set up trainer and run
trainer = training.Trainer(updater, (max_epoch, 'epoch'), out='mnist_result')
trainer.run()

import chainer

import chainer.functions as F

import chainer.links as L

from chainer import iterators, optimizer, training, Chain

from chainer.datasets import mnist

train, test = mnist.get_mnist()

batchsize = 128

max_epoch = 10

train_iter = iterators.SerialIterator(train, batchsize)

class MLP(Chain):

def __init__(self, n_mid_units=100, n_out=10):

super(MLP, self).__init__()

with self.init_scope():

self.l1 = L.Linear(None, n_mid_units)

self.l2 = L.Linear(None, n_mid_units)

self.l3 = L.Linear(None, n_out)

def forward(self, x):

h1 = F.relu(self.l1(x))

h2 = F.relu(self.l2(h1))

return self.l3(h2)

# create model

model = MLP()

model = L.Classifier(model) # using softmax cross entropy

# set up optimizer

optimizer = optimizers.MomentumSGD()

optimizer.setup(model)

# connect train iterator and optimizer to an updater

updater = training.updaters.StandardUpdater(train_iter, optimizer)

# set up trainer and run

trainer = training.Trainer(updater, (max_epoch, 'epoch'), out='mnist_result')

trainer.run()

The other obsoleted library is Theano. It has ceased development, but once upon a time, it was a major library for deep learning. In fact, the earlier version of the Keras library allows you to choose between a Theano or TensorFlow backend. Indeed, neither Theano nor TensorFlow are deep learning libraries precisely. Rather, they are tensor libraries that make matrix operations and differentiation handy, upon which deep learning operations can be built. Hence these two are considered replacements for each other from Keras’s perspective.

CNTK from Microsoft and Apache MXNet are the two other libraries worth mentioning. They are large with interfaces for multiple languages. Python, of course, is one of them. CNTK has C# and C++ interfaces, while MXNet provides interfaces for Java, Scala, R, Julia, C++, Clojure, and Perl. But recently, Microsoft decided to stop developing CNTK. But MXNet does have some momentum, and it is probably the most popular library after TensorFlow and PyTorch.

Below is an example of using MXNet via the R interface. Conceptually, you see the syntax is similar to Keras’s functional API:

require(mxnet)

train <- read.csv('data/train.csv', header=TRUE)
train <- data.matrix(train)
train.x <- train[,-1]
train.y <- train[,1]
train.x <- t(train.x/255)

data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=128)
act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")
fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=64)
act2 <- mx.symbol.Activation(fc2, name="relu2", act_type="relu")
fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=10)
softmax <- mx.symbol.SoftmaxOutput(fc3, name="sm")

devices <- mx.cpu()
mx.set.seed(0)
model <- mx.model.FeedForward.create(softmax, X=train.x, y=train.y,
                                     ctx=devices, num.round=10, array.batch.size=100,
                                     learning.rate=0.07, momentum=0.9,
                                     eval.metric=mx.metric.accuracy,
                                     initializer=mx.init.uniform(0.07),
                                     epoch.end.callback=mx.callback.log.train.metric(100))

require(mxnet)

train <- read.csv('data/train.csv', header=TRUE)

train <- data.matrix(train)

train.x <- train[,-1]

train.y <- train[,1]

train.x <- t(train.x/255)

data <- mx.symbol.Variable("data")

fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=128)

act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")

fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=64)

act2 <- mx.symbol.Activation(fc2, name="relu2", act_type="relu")

fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=10)

softmax <- mx.symbol.SoftmaxOutput(fc3, name="sm")

devices <- mx.cpu()

mx.set.seed(0)

model <- mx.model.FeedForward.create(softmax, X=train.x, y=train.y,

ctx=devices, num.round=10, array.batch.size=100,

learning.rate=0.07, momentum=0.9,

eval.metric=mx.metric.accuracy,

initializer=mx.init.uniform(0.07),

epoch.end.callback=mx.callback.log.train.metric(100))

PyTorch and TensorFlow

PyTorch and TensorFlow are the two major libraries nowadays. In the past, when TensorFlow was in version 1.x, they were vastly different. But as TensorFlow absorbed Keras as part of its library, these two libraries mostly work similarly.

PyTorch is backed by Facebook, and its syntax has been stable over the years. There are also a lot of existing models that we can borrow. The common way of defining a deep learning model in PyTorch is to create a class:

import torch
import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=(5,5), stride=1, padding=2)
        self.pool1 = nn.AvgPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5, stride=1, padding=0)
        self.pool2 = nn.AvgPool2d(kernel_size=2, stride=2)
        self.conv3 = nn.Conv2d(16, 120, kernel_size=5, stride=1, padding=0)
        self.flatten = nn.Flatten()
        self.linear4 = nn.Linear(120, 84)
        self.linear5 = nn.Linear(84, 10)
        self.softmax = nn.LogSoftMax(dim=1)
    
    def forward(self, x):
        x = F.tanh(self.conv1(x))
        x = self.pool1(x)
        x = F.tanh(self.conv2(x))
        x = self.pool2(x)
        x = F.tanh(self.conv3(x))
        x = self.flatten(x)
        x = F.tanh(self.linear4(x))
        x = self.linear5(x)
        return self.softmax(x)

model = Model()

import torch

import torch.nn as nn

import torch.nn.functional as F

class Model(nn.Module):

def __init__(self):

super().__init__()

self.conv1 = nn.Conv2d(1, 6, kernel_size=(5,5), stride=1, padding=2)

self.pool1 = nn.AvgPool2d(kernel_size=2, stride=2)

self.conv2 = nn.Conv2d(6, 16, kernel_size=5, stride=1, padding=0)

self.pool2 = nn.AvgPool2d(kernel_size=2, stride=2)

self.conv3 = nn.Conv2d(16, 120, kernel_size=5, stride=1, padding=0)

self.flatten = nn.Flatten()

self.linear4 = nn.Linear(120, 84)

self.linear5 = nn.Linear(84, 10)

self.softmax = nn.LogSoftMax(dim=1)

def forward(self, x):

x = F.tanh(self.conv1(x))

x = self.pool1(x)

x = F.tanh(self.conv2(x))

x = self.pool2(x)

x = F.tanh(self.conv3(x))

x = self.flatten(x)

x = F.tanh(self.linear4(x))

x = self.linear5(x)

return self.softmax(x)

model = Model()

But there is also a sequential syntax to make the code more concise:

import torch
import torch.nn as nn

model = nn.Sequential(
    # assume input 1x28x28
    nn.Conv2d(1, 6, kernel_size=(5,5), stride=1, padding=2),
    nn.Tanh(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Conv2d(6, 16, kernel_size=5, stride=1, padding=0),
    nn.Tanh(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Conv2d(16, 120, kernel_size=5, stride=1, padding=0),
    nn.Tanh(),
    nn.Flatten(),
    nn.Linear(120, 84),
    nn.Tanh(),
    nn.Linear(84, 10),
    nn.LogSoftmax(dim=1)
)

import torch

import torch.nn as nn

model = nn.Sequential(

# assume input 1x28x28

nn.Conv2d(1, 6, kernel_size=(5,5), stride=1, padding=2),

nn.Tanh(),

nn.AvgPool2d(kernel_size=2, stride=2),

nn.Conv2d(6, 16, kernel_size=5, stride=1, padding=0),

nn.Tanh(),

nn.AvgPool2d(kernel_size=2, stride=2),

nn.Conv2d(16, 120, kernel_size=5, stride=1, padding=0),

nn.Tanh(),

nn.Flatten(),

nn.Linear(120, 84),

nn.Tanh(),

nn.Linear(84, 10),

nn.LogSoftmax(dim=1)

)

TensorFlow in version 2.x adopted Keras as part of its libraries. In the past, these two were separate projects. In TensorFlow 1.x, we need to build a computation graph, set up a session, and derive gradients from a session for the deep learning model. Hence it is a bit too verbose. Keras is designed as a library to hide all these low-level details.

The same network as above can be produced by TensorFlow’s Keras syntax as follows:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Dense, AveragePooling2D, Flatten

model = Sequential([
    Conv2D(6, (5,5), input_shape=(28,28,1), padding="same", activation="tanh"),
    AveragePooling2D((2,2), strides=2),
    Conv2D(16, (5,5), activation="tanh"),
    AveragePooling2D((2,2), strides=2),
    Conv2D(120, (5,5), activation="tanh"),
    Flatten(),
    Dense(84, activation="tanh"),
    Dense(10, activation="softmax")
])

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Conv2D, Dense, AveragePooling2D, Flatten

model = Sequential([

Conv2D(6, (5,5), input_shape=(28,28,1), padding="same", activation="tanh"),

AveragePooling2D((2,2), strides=2),

Conv2D(16, (5,5), activation="tanh"),

AveragePooling2D((2,2), strides=2),

Conv2D(120, (5,5), activation="tanh"),

Flatten(),

Dense(84, activation="tanh"),

Dense(10, activation="softmax")

])

One major difference between PyTorch and Keras syntax is in the training loop. In Keras, we just need to assign the loss function, the optimization algorithm, the dataset, and some other parameters to the model. Then we have a fit() function to do all the training work, as follows:

...

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, batch_size=32)

...

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, batch_size=32)

But in PyTorch, we need to write our own training loop code:

# self-defined training loop function
def training_loop(model, optimizer, loss_fn, train_loader, val_loader=None, n_epochs=100):
    best_loss, best_epoch = np.inf, -1
    best_state = model.state_dict()
    
    for epoch in range(n_epochs):
        # Training
        model.train()
        train_loss = 0
        for data, target in train_loader:
            output = model(data)
            loss = loss_fn(output, target)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            train_loss += loss.item()
        # Validation
        model.eval()
        status = (f"{str(datetime.datetime.now())} End of epoch {epoch}, "
                  f"training loss={train_loss/len(train_loader)}")
        if val_loader:
            val_loss = 0
            for data, target in val_loader:
                output = model(data)
                loss = loss_fn(output, target)
                val_loss += loss.item()
            status += f", validation loss={val_loss/len(val_loader)}"
        print(status)

optimizer = optim.Adam(model.parameters())
criterion = nn.NLLLoss()
training_loop(model, optimizer, criterion, train_loader, test_loader, n_epochs=100)

# self-defined training loop function

def training_loop(model, optimizer, loss_fn, train_loader, val_loader=None, n_epochs=100):

best_loss, best_epoch = np.inf, -1

best_state = model.state_dict()

for epoch in range(n_epochs):

# Training

model.train()

train_loss = 0

for data, target in train_loader:

output = model(data)

loss = loss_fn(output, target)

optimizer.zero_grad()

loss.backward()

optimizer.step()

train_loss += loss.item()

# Validation

model.eval()

status = (f"{str(datetime.datetime.now())} End of epoch {epoch}, "

f"training loss={train_loss/len(train_loader)}")

if val_loader:

val_loss = 0

for data, target in val_loader:

output = model(data)

loss = loss_fn(output, target)

val_loss += loss.item()

status += f", validation loss={val_loss/len(val_loader)}"

print(status)

optimizer = optim.Adam(model.parameters())

criterion = nn.NLLLoss()

training_loop(model, optimizer, criterion, train_loader, test_loader, n_epochs=100)

This may not be an issue if you’re experimenting with a new design of a network in which you want to have more control over how the loss is calculated and how the optimizer updates the model weights. But otherwise, you will appreciate the simpler syntax from Keras.

Note that both PyTorch and TensorFlow are libraries with a Python interface. Therefore, it is possible to have an interface for other languages too. For example, there are Torch for R and TensorFlow for R.

Also, note that the libraries mentioned above are full-featured libraries that include training and prediction. If you consider a production environment where you make use of a trained model, there could be a wider choice. TensorFlow has a “TensorFlow Lite” counterpart that allows a trained model to be run on a mobile or the web. Intel also has an OpenVINO library that aims to optimize the performance in prediction.

Summary

In this post, you discovered various deep learning libraries and some of their characteristics. Specifically, you learned:

What are the libraries available for C++ and Python
How the Chainer library influenced the syntax in building a deep learning model
The relationship between Keras and TensorFlow 2.x
What are the differences between PyTorch and TensorFlow

5 Responses to Overview of Some Deep Learning Libraries

Ciprian July 2, 2022 at 3:06 pm #

Great overview!

- James Carmichael July 3, 2022 at 12:58 pm #
  
  Thank you for the feedback Ciprian!
  
Kevin July 4, 2022 at 11:18 pm #

Thanks for the overview with sample code snippets!

Romeo December 16, 2022 at 9:36 am #

Thank you for the great post!

Rapidly these libraries are moving away from academic institutions to being backed by commercial companies. Eventually this left me wondering that as deep learning moves from research to production do we expect a need for more C++ and java libraries for machine learning? Versus maybe if there were more C++ and java libraries, would more companies would be picking up machine learning solutions? Overall, I’m looking forward to looking into more of those production environment libraries you mentioned. –Romeo

- James Carmichael December 17, 2022 at 7:57 am #
  
  You raise some interesting questions Romeo! The answers depend upon specific goals (i.e. research, product development, education…etc).

Navigation

Overview of Some Deep Learning Libraries

Overview

The C++ Libraries

Python Libraries

PyTorch and TensorFlow

Further Reading

Summary

More On This Topic

5 Responses to Overview of Some Deep Learning Libraries

Leave a Reply Click here to cancel reply.