How to Evaluate Pixel Scaling Methods for Image Classification With CNNs

Last Updated on

Image data must be prepared before it can be used as the basis for modeling in image classification tasks.

One aspect of preparing image data is scaling pixel values, such as normalizing the values to the range 0-1, centering, standardization, and more.

How do you choose a good, or even best, pixel scaling method for your image classification or computer vision modeling task?

In this tutorial, you will discover how to choose a pixel scaling method for image classification with deep learning methods.

After completing this tutorial, you will know:

  • A procedure for choosing a pixel scaling method using experimentation and empirical results on a specific dataset.
  • How to implement standard pixel scaling methods for preparing image data for modeling.
  • How to work through a case study for choosing a pixel scaling method for a standard image classification problem.

Discover how to build models for photo classification, object detection, face recognition, and more in my new computer vision book, with 30 step-by-step tutorials and full source code.

Let’s get started.

How to Evaluate Pixel Scaling Methods for Image Classification With Convolutional Neural Networks

How to Evaluate Pixel Scaling Methods for Image Classification With Convolutional Neural Networks
Photo by Andres Alvarado, some rights reserved.

Tutorial Overview

This tutorial is divided into 6 parts; they are:

  1. Procedure for Choosing a Pixel Scaling Method
  2. Choose Dataset: MNIST Image Classification
  3. Choose Model: Convolutional Neural Network
  4. Choose Pixel Scaling Methods
  5. Run Experiment
  6. Analyze Results

Procedure for Choosing a Pixel Scaling Method

Given a new image classification task, what pixel scaling methods should be used?

There are many ways to answer this question; for example:

  • Use techniques reportedly used for similar problems in research papers.
  • Use heuristics from blog posts, courses, or books.
  • Use your favorite technique.
  • Use the simplest technique.

Instead, I recommend using experimentation in order to discover what works best for your specific dataset.

This can be achieved using the following process:

  • Step 1: Choose Dataset. This may be the entire training dataset or a small subset. The idea is to complete the experiments quickly and get a result.
  • Step 2: Choose Model. Design a model that is skillful, but not necessarily the best model for the problem. Some parallel prototyping of models may be required.
  • Step 3: Choose Pixel Scaling Methods. List 3-5 data preparation schemes for evaluation of your problem.
  • Step 4: Run Experiment. Run the experiments in such a way that the results are robust and representative, ideally repeat each experiment multiple times.
  • Step 5: Analyze Results. Compare methods both in terms of the speed of learning and mean performance across repeated experiments.

The experimental approach will use a non-optimized model and perhaps a subset of training data, both of which may add noise to the decision you must make.

Therefore, you are looking for a signal that one data preparation scheme for your images is clearly better than the others; if this is not the case for your dataset, then the simplest (least computationally complex) technique should be used, such as pixel normalization.

A clear signal of a superior pixel scaling method may be seen in one of two ways:

  • Faster Learning. Learning curves clearly show that a model learns faster with a given data preparation scheme.
  • Better Accuracy. Mean model performance clearly shows better accuracy with a given data preparation scheme.

Now that we have a procedure for choosing a pixel scaling method for image data, let’s look at an example. We will use the MNIST image classification task fit with a CNN and evaluate a range of standard pixel scaling methods.

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Step 1. Choose Dataset: MNIST Image Classification

The MNIST problem, or MNIST for short, is an image classification problem comprised of 70,000 images of handwritten digits.

The goal of the problem is to classify a given image of a handwritten digit as an integer from 0 to 9. As such, it is a multiclass image classification problem.

It is a standard dataset for evaluating machine learning and deep learning algorithms. Best results for the dataset are about 99.79% accurate, or an error rate of about 0.21% (e.g. less than 1%).

This dataset is provided as part of the Keras library and can be automatically downloaded (if needed) and loaded into memory by a call to the keras.datasets.mnist.load_data() function.

The function returns two tuples: one for the training inputs and outputs and one for the test inputs and outputs. For example:

We can load the MNIST dataset and summarize it.

The complete example is listed below.

Running the example first loads the dataset into memory. Then the shape of the training and test datasets is reported.

We can see that all images are 28 by 28 pixels with a single channel for grayscale images. There are 60,000 images for the training dataset and 10,000 for the test dataset.

We can also see that pixel values are integer values between 0 and 255 and that the mean and standard deviation of the pixel values are similar between the two datasets.

The dataset is relatively small; we will use the entire train and test dataset

Now that we are familiar with MNIST and how to load the dataset, let’s review some pixel scaling methods.

Step 2. Choose Model: Convolutional Neural Network

We will use a convolutional neural network model to evaluate the different pixel scaling methods.

A CNN is expected to perform very well on this problem, although the model chosen for this experiment does not have to perform well or best for the problem. Instead, it must be skillful (better than random) and must allow the impact of different data preparation schemes to be differentiated in terms of speed of learning and/or model performance.

As such, the model must have sufficient capacity to learn the problem.

We will demonstrate the baseline model on the MNIST problem.

First, the dataset must be loaded and the shape of the train and test dataset expanded to add a channel dimension, set to one as we only have a single black and white channel.

Next, we will normalize the pixel values for this example and one hot encode the target values, required for multiclass classification.

The model is defined as a convolutional layer followed by a max pooling layer; this combination is repeated again, then the filter maps are flattened, interpreted by a fully connected layer and followed by an output layer.

The ReLU activation function is used for hidden layers and the softmax activation function is used for the output layer. Enough filter maps and nodes are specified to provide sufficient capacity to learn the problem.

The Adam variation of stochastic gradient descent is used to find the model weights. The categorical cross entropy loss function is used, required for multi-class classification, and classification accuracy is monitored during training.

The model is fit for five training epochs and a large batch size of 128 images is used.

Once fit, the model is evaluated on the test dataset.

The complete example is listed below and will easily run on the CPU in about a minute.

Running the example shows that the model is capable of learning the problem well and quickly.

In fact, the performance of the model on the test dataset on this run is 99%, or a 1% error rate. This is not state of the art (by design), but is not terribly far from state of the art either.

Step 3. Choose Pixel Scaling Methods

Neural network models often cannot be trained on raw pixel values, such as pixel values in the range of 0 to 255.

The reason is that the network uses a weighted sum of inputs, and for the network to both be stable and train effectively, weights should be kept small.

Instead, the pixel values must be scaled prior to training. There are perhaps three main approaches to scaling pixel values; they are:

  • Normalization: pixel values are scaled to the range 0-1.
  • Centering: the mean pixel value is subtracted from each pixel value resulting in a distribution of pixel values centered on a mean of zero.
  • Standardization: the pixel values are scaled to a standard Gaussian with a mean of zero and a standard deviation of one.

Traditionally, sigmoid activation functions were used and inputs that sum to 0 (zero mean) were preferred. This may or may not still be the case with the wide adoption of ReLU and similar activation functions.

Further, in centering and standardization, the mean or mean and standard deviation can be calculated across a channel, an image, a mini-batch, or the entire training dataset. This may add additional variations on a chosen scaling method that may be evaluated.

Normalization is often the default approach as we can assume pixel values are always in the range 0-255, making the procedure very simple and efficient to implement.

Centering is often promoted as the preferred approach as it was used in many popular papers, although the mean can be calculated per image (global) or channel (local) and across the batch of images or the entire training dataset, and often the procedure described in a paper does not specify exactly which variation was used.

We will experiment with the three approaches listed above, namely normalization, centering, and standardization. The mean for centering and the mean and standard deviation for standardization will be calculated across the entire training dataset.

Other variations you could explore include:

  • Calculating statistics for each channel (for color images).
  • Calculating statistics for each image.
  • Calculating statistics for each batch.
  • Normalizing after centering or standardizing.

The example below implements the three chosen pixel scaling methods and demonstrate their effect on the MNIST dataset.

Running the example first normalizes the dataset and reports the min, max, mean, and standard deviation for the train and test dataset.

This is then repeated for the centering and standardization data preparation schemes. The results provide evidence that the scaling procedures are indeed implemented correctly.

Step 4. Run Experiment

Now that we have defined the dataset, the model, and the data preparation schemes to evaluate, we are ready to define and run the experiment.

Each model takes about one minute to run on the CPU, so we don’t want to the experiment to take too long. We will evaluate each of the three data preparation schemes and each scheme will be evaluated 10 times, meaning that about 30 minutes will be required to complete the experiment on modern hardware.

We can define a function to load the dataset afresh when needed.

We can also define a function to define and compile our model ready to fit on the problem.

We already have functions for preparing the pixel data for the train and test datasets.

Finally, we can define a function called repeated_evaluation() that takes the name of the data preparation function to call to prepare the data and will load the dataset and repeatedly define the model, prepare the dataset, fit, and evaluate the model. It will return a list of accuracy scores that can be used to summarize the performance of the model under the chosen data preparation scheme.

The repeated_evaluation() function can then be called for each of the three data preparation schemes and the mean and standard deviation of model performance under the scheme can be reported.

We can also create a box and whisker plot to summarize and compare the distribution of accuracy scores for each scheme.

Tying all of this together, the complete example of running the experiment to compare pixel scaling methods on the MNIST dataset is listed below.

Running the example may take about 30 minutes on the CPU and your results may vary given the stochastic nature of the training algorithm.

The accuracy is reported for each repeated evaluation of the model and the mean and standard deviation of accuracy scores are repeated at the end of each run.

Box and Whisker Plot of CNN Performance on MNIST With Different Pixel Scaling Methods

Box and Whisker Plot of CNN Performance on MNIST With Different Pixel Scaling Methods

Step 5. Analyze Results

For brevity, we will only look at model performance in the comparison of data preparation schemes. An extension to this study would also look at learning rates under each pixel scaling method.

The results of the experiments show that there is little or no difference (at the chosen precision) between pixel normalization and standardization with the chosen model on the MNIST dataset.

From these results, I would use normalization over standardization on this dataset with this model because of the good results and because of the simplicity of normalization as compared to standardization.

These are useful results in that they show that the default heuristic to center pixel values prior to modeling would not be good advice for this dataset.

Sadly, the box and whisker plot does not make a comparison between the spread of accuracy scores easy as some terrible outlier scores for the centering scaling method squash the distributions.


This section lists some ideas for extending the tutorial that you may wish to explore.

  • Batch-Wise Scaling. Update the study to calculate scaling statistics per batch instead of across the entire training dataset and see if that makes a difference to the choice of scaling method.
  • Learning Curves. Update the study to collect a few learning curves for each data scaling method and compare the speed of learning.
  • CIFAR. Repeat the study on the CIFAR-10 dataset and add pixel scaling methods that support global (scale across all channels) and local (scaling per channel) approaches.

If you explore any of these extensions, I’d love to know.
Post your findings in the comments below.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.


In this tutorial, you discovered how to choose a pixel scaling method for image classification with deep learning methods.

Specifically, you learned:

  • A procedure for choosing a pixel scaling method using experimentation and empirical results on a specific dataset.
  • How to implement standard pixel scaling methods for preparing image data for modeling.
  • How to work through a case study for choosing a pixel scaling method for a standard image classification problem.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning Models for Vision Today!

Deep Learning for Computer Vision

Develop Your Own Vision Models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Computer Vision

It provides self-study tutorials on topics like:
classification, object detection (yolo and rcnn), face recognition (vggface and facenet), data preparation and much more...

Finally Bring Deep Learning to your Vision Projects

Skip the Academics. Just Results.

See What's Inside

4 Responses to How to Evaluate Pixel Scaling Methods for Image Classification With CNNs

  1. JG March 28, 2019 at 8:54 pm #

    Nice job Jason!. I like the way you setup de program vs different modules (functions) such as load data, define model, preprocess Image, Evaluation for different preprocess images, and plot results. It is very well structured and so the message you are communicated is very clear!

    I have a question. In many cases you do not have a big dataset of images to train or even the images have lower resolutions (less pixels per image) than expected. Because in this MNIST dataset it is achieved a brilliant results (better than 99%) Have you try to study which are the minimum amount of dataset images to be trained (less than 60,000), and lower resolutions (less than 28,28) to achieve also good results, that is to say perform a sensitivity analysis on how results depend on different dataset volume and image resolution.
    Particularly I am interested on different image resolution (in another case study), because of cpu time and memory impact, in addition of difficulties to find high resolution images availability. Thanks

    • Jason Brownlee March 29, 2019 at 8:33 am #

      Good question.

      I have not performed this analysis, but smaller datasets/sized images are often used in research because of the speed of experiments.

      Larger data/images just take so long to process.

  2. Filipe March 30, 2019 at 7:17 pm #

    Hi Jason, great post. I’m having an issue with your code.

    When I run:

    “# summarize pixel values
    print(‘Train’, train_images.min(), train_images.max(), train_images.mean(), train_images.std())”

    The console freezes, did this happened to you?

Leave a Reply