How to Get Started With Deep Learning for Computer Vision (7-Day Mini-Course)

Last Updated on

Deep Learning for Computer Vision Crash Course.
Bring Deep Learning Methods to Your Computer Vision Project in 7 Days.

We are awash in digital images from photos, videos, Instagram, YouTube, and increasingly live video streams.

Working with image data is hard as it requires drawing upon knowledge from diverse domains such as digital signal processing, machine learning, statistical methods, and these days, deep learning.

Deep learning methods are out-competing the classical and statistical methods on some challenging computer vision problems with singular and simpler models.

In this crash course, you will discover how you can get started and confidently develop deep learning for computer vision problems using Python in seven days.

Note: This is a big and important post. You might want to bookmark it.

Let’s get started.

  • Update Nov/2019: Updated for TensorFlow v2.0 and MTCNN v0.1.0.
How to Get Started With Deep Learning for Computer Vision (7-Day Mini-Course)

How to Get Started With Deep Learning for Computer Vision (7-Day Mini-Course)
Photo by oliver.dodd, some rights reserved.

Who Is This Crash-Course For?

Before we get started, let’s make sure you are in the right place.

The list below provides some general guidelines as to who this course was designed for.

Don’t panic if you don’t match these points exactly; you might just need to brush up in one area or another to keep up.

You need to know:

  • You need to know your way around basic Python, NumPy, and Keras for deep learning.

You do NOT need to be:

  • You do not need to be a math wiz!
  • You do not need to be a deep learning expert!
  • You do not need to be a computer vision researcher!

This crash course will take you from a developer that knows a little machine learning to a developer who can bring deep learning methods to your own computer vision project.

Note: This crash course assumes you have a working Python 2 or 3 SciPy environment with at least NumPy, Pandas, scikit-learn, and Keras 2 installed. If you need help with your environment, you can follow the step-by-step tutorial here:

Crash-Course Overview

This crash course is broken down into seven lessons.

You could complete one lesson per day (recommended) or complete all of the lessons in one day (hardcore). It really depends on the time you have available and your level of enthusiasm.

Below are the seven lessons that will get you started and productive with deep learning for computer vision in Python:

  • Lesson 01: Deep Learning and Computer Vision
  • Lesson 02: Preparing Image Data
  • Lesson 03: Convolutional Neural Networks
  • Lesson 04: Image Classification
  • Lesson 05: Train Image Classification Model
  • Lesson 06: Image Augmentation
  • Lesson 07: Face Detection

Each lesson could take you anywhere from 60 seconds up to 30 minutes. Take your time and complete the lessons at your own pace. Ask questions and even post results in the comments below.

The lessons might expect you to go off and find out how to do things. I will give you hints, but part of the point of each lesson is to force you to learn where to go to look for help on and about the deep learning, computer vision, and the best-of-breed tools in Python (hint: I have all of the answers on this blog, just use the search box).

Post your results in the comments; I’ll cheer you on!

Hang in there; don’t give up.

Note: This is just a crash course. For a lot more detail and fleshed out tutorials, see my book on the topic titled “Deep Learning for Computer Vision.”

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Lesson 01: Deep Learning and Computer Vision

In this lesson, you will discover the promise of deep learning methods for computer vision.

Computer Vision

Computer Vision, or CV for short, is broadly defined as helping computers to “see” or extract meaning from digital images such as photographs and videos.

Researchers have been working on the problem of helping computers see for more than 50 years, and some great successes have been achieved, such as the face detection available in modern cameras and smartphones.

The problem of understanding images is not solved, and may never be. This is primarily because the world is complex and messy. There are few rules. And yet we can easily and effortlessly recognize objects, people, and context.

Deep Learning

Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.

A property of deep learning is that the performance of this type of model improves by training it with more examples and by increasing its depth or representational capacity.

In addition to scalability, another often-cited benefit of deep learning models is their ability to perform automatic feature extraction from raw data, also called feature learning.

Promise of Deep Learning for Computer vision

Deep learning methods are popular for computer vision, primarily because they are delivering on their promise.

Some of the first large demonstrations of the power of deep learning were in computer vision, specifically image classification. More recently in object detection and face recognition.

The three key promises of deep learning for computer vision are as follows:

  • The Promise of Feature Learning. That is, that deep learning methods can automatically learn the features from image data required by the model, rather than requiring that the feature detectors be handcrafted and specified by an expert.
  • The Promise of Continued Improvement. That is, that the performance of deep learning in computer vision is based on real results and that the improvements appear to be continuing and perhaps speeding up.
  • The Promise of End-to-End Models. That is, that large end-to-end deep learning models can be fit on large datasets of images or video offering a more general and better-performing approach.

Computer vision is not “solved” but deep learning is required to get you to the state-of-the-art on many challenging problems in the field.

Your Task

For this lesson, you must research and list five impressive applications of deep learning methods in the field of computer vision. Bonus points if you can link to a research paper that demonstrates the example.

Post your answer in the comments below. I would love to see what you discover.

In the next lesson, you will discover how to prepare image data for modeling.

Lesson 02: Preparing Image Data

In this lesson, you will discover how to prepare image data for modeling.

Images are comprised of matrices of pixel values.

Pixel values are often unsigned integers in the range between 0 and 255. Although these pixel values can be presented directly to neural network models in their raw format, this can result in challenges during modeling, such as slower than expected training of the model.

Instead, there can be great benefit in preparing the image pixel values prior to modeling, such as simply scaling pixel values to the range 0-1 to centering and even standardizing the values.

This is called normalization and can be performed directly on a loaded image. The example below uses the PIL library (the standard image handling library in Python) to load an image and normalize its pixel values.

First, confirm that you have the Pillow library installed; it is installed with most SciPy environments, but you can learn more here:

Next, download a photograph of Bondi Beach in Sydney Australia, taken by Isabell Schulz and released under a permissive license. Save the image in your current working directory with the filename ‘bondi_beach.jpg‘.

Next, we can use the Pillow library to load the photo, confirm the min and max pixel values, normalize the values, and confirm the normalization was performed.

Your Task

Your task in this lesson is to run the example code on the provided photograph and report the min and max pixel values before and after the normalization.

For bonus points, you can update the example to standardize the pixel values.

Post your findings in the comments below. I would love to see what you discover.

In the next lesson, you will discover information about convolutional neural network models.

Lesson 03: Convolutional Neural Networks

In this lesson, you will discover how to construct a convolutional neural network using a convolutional layer, pooling layer, and fully connected output layer.

Convolutional Layers

A convolution is the simple application of a filter to an input that results in an activation. Repeated application of the same filter to an input results in a map of activations called a feature map, indicating the locations and strength of a detected feature in an input, such as an image.

A convolutional layer can be created by specifying both the number of filters to learn and the fixed size of each filter, often called the kernel shape.

Pooling Layers

Pooling layers provide an approach to downsampling feature maps by summarizing the presence of features in patches of the feature map.

Maximum pooling, or max pooling, is a pooling operation that calculates the maximum, or largest, value in each patch of each feature map.

Classifier Layer

Once the features have been extracted, they can be interpreted and used to make a prediction, such as classifying the type of object in a photograph.

This can be achieved by first flattening the two-dimensional feature maps, and then adding a fully connected output layer. For a binary classification problem, the output layer would have one node that would predict a value between 0 and 1 for the two classes.

Convolutional Neural Network

The example below creates a convolutional neural network that expects grayscale images with the square size of 256×256 pixels, with one convolutional layer with 32 filters, each with the size of 3×3 pixels, a max pooling layer, and a binary classification output layer.

Your Task

Your task in this lesson is to run the example and describe how the shape of an input image would be changed by the convolutional and pooling layers.

For extra points, you could try adding more convolutional or pooling layers and describe the effect it has on the image as it flows through the model.

Post your findings in the comments below. I would love to see what you discover.

In the next lesson, you will learn how to use a deep convolutional neural network to classify photographs of objects.

Lesson 04: Image Classification

In this lesson, you will discover how to use a pre-trained model to classify photographs of objects.

Deep convolutional neural network models may take days, or even weeks, to train on very large datasets.

A way to short-cut this process is to re-use the model weights from pre-trained models that were developed for standard computer vision benchmark datasets, such as the ImageNet image recognition tasks.

The example below uses the VGG-16 pre-trained model to classify photographs of objects into one of 1,000 known classes.

Download this photograph of a dog taken by Justin Morgan and released under a permissive license. Save it in your current working directory with the filename ‘dog.jpg‘.

The example below will load the photograph and output a prediction, classifying the object in the photograph.

Note: The first time you run the example, the pre-trained model will have to be downloaded, which is a few hundred megabytes and make take a few minutes based on the speed of your internet connection.

Your Task

Your task in this lesson is to run the example and report the result.

For bonus points, try running the example on another photograph of a common object.

Post your findings in the comments below. I would love to see what you discover.

In the next lesson, you will discover how to fit and evaluate a model for image classification.

Lesson 05: Train Image Classification Model

In this lesson, you will discover how to train and evaluate a convolutional neural network for image classification.

The Fashion-MNIST clothing classification problem is a new standard dataset used in computer vision and deep learning.

It is a dataset comprised of 60,000 small square 28×28 pixel grayscale images of items of 10 types of clothing, such as shoes, t-shirts, dresses, and more.

The example below loads the dataset, scales the pixel values, then fits a convolutional neural network on the training dataset and evaluates the performance of the network on the test dataset.

The example will run in just a few minutes on a modern CPU; no GPU is required.

Your Task

Your task in this lesson is to run the example and report the performance of the model on the test dataset.

For bonus points, try varying the configuration of the model, or try saving the model and later loading it and using it to make a prediction on new grayscale photographs of clothing.

Post your findings in the comments below. I would love to see what you discover.

In the next lesson, you will discover how to use image augmentation on training data.

Lesson 06: Image Augmentation

In this lesson, you will discover how to use image augmentation.

Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset.

Training deep learning neural network models on more data can result in more skillful models, and the augmentation techniques can create variations of the images that can improve the ability of the fit models to generalize what they have learned to new images.

The Keras deep learning neural network library provides the capability to fit models using image data augmentation via the ImageDataGenerator class.

Download a photograph of a bird by AndYaDontStop, released under a permissive license. Save it into your current working directory with the name ‘bird.jpg‘.

The example below will load the photograph as a dataset and use image augmentation to create flipped and rotated versions of the image that can be used to train a convolutional neural network model.

Your Task

Your task in this lesson is to run the example and report the effect that the image augmentation has had on the original image.

For bonus points, try additional types of image augmentation, supported by the ImageDataGenerator class.

Post your findings in the comments below. I would love to see what you find.

In the next lesson, you will discover how to use a deep convolutional network to detect faces in photographs.

Lesson 07: Face Detection

In this lesson, you will discover how to use a convolutional neural network for face detection.

Face detection is a trivial problem for humans to solve and has been solved reasonably well by classical feature-based techniques, such as the cascade classifier.

More recently, deep learning methods have achieved state-of-the-art results on standard face detection datasets. One example is the Multi-task Cascade Convolutional Neural Network, or MTCNN for short.

The ipazc/MTCNN project provides an open source implementation of the MTCNN that can be installed easily as follows:

Download a photograph of a person on the street taken by Holland and released under a permissive license. Save it into your current working directory with the name ‘street.jpg‘.

The example below will load the photograph and use the MTCNN model to detect faces and will plot the photo and draw a box around the first detected face.

Your Task

Your task in this lesson is to run the example and describe the result.

For bonus points, try the model on another photograph with multiple faces and update the code example to draw a box around each detected face.

Post your findings in the comments below. I would love to see what you discover.

The End!
(Look How Far You Have Come)

You made it. Well done!

Take a moment and look back at how far you have come.

You discovered:

  • What computer vision is and the promise and impact that deep learning is having on the field.
  • How to scale the pixel values of image data in order to make them ready for modeling.
  • How to develop a convolutional neural network model from scratch.
  • How to use a pre-trained model to classify photographs of objects.
  • How to train a model from scratch to classify photographs of clothing.
  • How to use image augmentation to create modified copies of photographs in your training dataset.
  • How to use a pre-trained deep learning model to detect people’s faces in photographs.

This is just the beginning of your journey with deep learning for computer vision. Keep practicing and developing your skills.

Take the next step and check out my book on deep learning for computer vision.

Summary

How Did You Do With The Mini-Course?
Did you enjoy this crash course?

Do you have any questions? Were there any sticking points?
Let me know. Leave a comment below.

Develop Deep Learning Models for Vision Today!

Deep Learning for Computer Vision

Develop Your Own Vision Models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Computer Vision

It provides self-study tutorials on topics like:
classification, object detection (yolo and rcnn), face recognition (vggface and facenet), data preparation and much more...

Finally Bring Deep Learning to your Vision Projects

Skip the Academics. Just Results.

See What's Inside

57 Responses to How to Get Started With Deep Learning for Computer Vision (7-Day Mini-Course)

  1. Abid Rizvi April 11, 2019 at 1:53 pm #

    Lesson 02: Preparing Image Data
    ================================

    Before Normalization:

    Data Type: uint8
    Min: 0.000, Max: 255.000
    Min: 0.000, Max: 1.000

    After Normalization:
    Data Type: uint8
    Min: 0.000, Max: 255.000
    Min: 0.000, Max: 1.000

  2. Abid Rizvi April 11, 2019 at 1:57 pm #

    Lesson 02: Preparing Image Data
    ================================

    Before Normalization:
    Min: 0.000, Max: 255.000

    After Normalization:
    Min: 0.000, Max: 1.000

    Max value 255 converts to 1.000

  3. Abid Rizvi April 11, 2019 at 1:59 pm #

    Lesson 02: Preparing Image Data
    ===============================
    For bonus points, you can update the example to standardize the pixel values.

    What do you mean by standardize the pixel values? Please elaborate.

  4. Abid Rizvi April 11, 2019 at 2:18 pm #

    Lesson 03: Convolutional Neural Networks
    =========================================

    input_shape=(256, 256, 1)

    Convolutional Layer 1 (filter size 3×3)
    ————————————–
    model.add(Conv2D(32, (3,3), input_shape=(256, 256, 1)))

    Output shape: (None, 254, 254, 32)
    Max Pooling: (None, 127, 127, 32)

    Convolutional Layer 2 (filter size 3×3)
    ————————————–
    model.add(Conv2D(32, (3,3)))

    Output shape: (None, 252, 252, 32)
    Max Pooling: (None, 126, 126, 32)

    Convolutional Layer 3 (filter size 7×7)
    ————————————–
    model.add(Conv2D(32, (7,7)))

    Output shape: (None, 246, 246, 32)
    Max Pooling: (None, 125, 125, 32)

    • Bort June 14, 2019 at 10:46 pm #

      This seems pretty wrong to me as the maxPooling shape is used as input for the next layer.
      So you would go from 256 -> 254 -> 127 -> 125 -> 62 -> 56 -> 28

      Furthermore, as far as I understand it, the number of filters usually increases.
      32 -> 64 -> 128

  5. Abid Rizvi April 11, 2019 at 5:01 pm #

    Lesson 04: Lesson 04: Image Classification
    =========================================
    Doberman (33.59%)

    Lesson 04: Lesson 04: Image Classification
    =========================================
    Doberman (33.59%)

    some other image (I downloaded two image one for a dog and another for human)

    Dog result:
    German_shepherd (87.66%)

    Human result:
    swimming_trunks (15.77%)

  6. Abid Rizvi April 12, 2019 at 4:39 pm #

    Lesson 05: Train Image Classification Model
    ============================================
    (Yesterday after loading running the example in first go)
    Loss: 0.318009318998456
    Accuracy: 0.912

    (Today after loading loading model from saved model/weights)
    Loss: 0.30647955482006073
    Accuracy: 0.9113

    Why there is some light changes in the third decimals of Loss & Accuracy?

  7. Turyamusiima Dismas April 13, 2019 at 2:49 am #

    before normalisation

    Min: 0.000, Max: 255.000

    after normalisation

    Min: 0.000, Max: 1.000

  8. Jamuna Prakash April 13, 2019 at 9:59 am #

    Lesson#03’s output:

    Data Type: uint8
    Min: 0.000, Max: 255.000
    Min: 0.000, Max: 1.000

  9. Nitin May 6, 2019 at 6:55 pm #

    Lesson 02: Preparing Image Data
    ================================
    Data Type: uint8
    Min: 0.000, Max: 255.000
    Min: 0.000, Max: 1.000

  10. jk (jayakumar) May 25, 2019 at 8:57 pm #

    import tensorflow as tf
    print(tf.__version__)

    2.0.0-alpha0
    In this version of tensorflow, and lesson 3 code

    from keras.models import Sequential
    model = Sequential()

    lead to following Error.
    AttributeError: module ‘tensorflow’ has no attribute ‘get_default_graph

    Then there is need to change
    #from keras.models import Sequential

    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Conv2D
    from tensorflow.keras.layers import MaxPooling2D
    from tensorflow.keras.layers import Flatten
    from tensorflow.keras.layers import Dense

    • Jason Brownlee May 26, 2019 at 6:44 am #

      I recommend using the Keras library directly, not the keras interface in tensorflow.

  11. Marcello October 15, 2019 at 4:12 am #

    Image Classification https://arxiv.org/abs/1512.03385
    Image Classification With Localization https://arxiv.org/abs/1311.2524
    Object Detection https://arxiv.org/abs/1506.02640
    Object Segmentation https://ieeexplore.ieee.org/document/7803544
    Image Style Transfer https://ieeexplore.ieee.org/document/7780634
    Image Colorization
    Image Reconstruction
    Image Super-Resolution
    Image Synthesis

  12. Saurabh December 12, 2019 at 11:11 pm #

    Hello Jason,

    Thanks for sharing mini course.

    I am trying to run MTCNN on tensorflow 2.0 and throws error: module ‘tensorflow’ has no attribute ‘get_default_graph’

    I cross verified my opencv-python version i.e. 4.1.2 and MTCNN version 0.1.0.

    Could you please guide me?

    Thanking you,
    Saurabh

    • Jason Brownlee December 13, 2019 at 6:03 am #

      You must use TF1.15 or TF1.14 with Mask RCNN.

      • Saurabh December 13, 2019 at 7:51 pm #

        Thank you! It means MTCNN is not supported by TF2.0? Right?

        • Jason Brownlee December 14, 2019 at 6:15 am #

          Yes, I recommend TF2. MTCNN uses TF2.

          • Saurabh December 16, 2019 at 8:00 pm #

            Thank you!

  13. sara ahmed January 20, 2020 at 9:40 pm #

    1- MNIST dataset.
    2- detecting Alzheimer’s disease using CNN
    3- image segmentation using semantic segmentation
    4- image classification using 3D-CNN and autoencoder

  14. Vasudha February 24, 2020 at 11:52 pm #

    Lesson 1: Deep Learning and computer vision
    ______________________________________

    1. Object Detection
    (W. Ouyang et al., “DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 7, pp. 1320-1334, 1 July 2017.)

    2. Face detection and recognition
    (https://www.researchgate.net/publication/255653401)

    3. Action and Activity recognition (http://yann.lecun.com/exdb/publis/pdf/lecun-90c.pdf)

    4. Human Pose estimation ( 3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information – https://link.springer.com/chapter/10.1007/978-3-319-49409-8_15)

    5. Datasets / Images (https://www.researchgate.net/publication/275257620_Image_Classification_Using_Convolutional_Neural_Networks)

  15. Vasudha February 27, 2020 at 10:39 pm #

    Lesson 02: Preparing Image Data
    ———————————————–

    Data Type: uint8
    Min: 0.000, Max: 255.000
    Min: 0.000, Max: 1.000

  16. Vasudha February 27, 2020 at 10:47 pm #

    Lesson 3 : Convolutional Neural Networks
    ———————————————————-
    Model: “sequential_1”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_1 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 127, 127, 32) 0
    _________________________________________________________________
    flatten_1 (Flatten) (None, 516128) 0
    _________________________________________________________________
    dense_1 (Dense) (None, 1) 516129
    =================================================================
    Total params: 516,449
    Trainable params: 516,449
    Non-trainable params: 0

  17. Vasudha February 28, 2020 at 12:17 am #

    Lesson 3: Convolutional neural networks / For extra points
    ———————————————————–
    Model: “sequential_1”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_1 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    conv2d_2 (Conv2D) (None, 252, 252, 32) 9248
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 126, 126, 32) 0
    _________________________________________________________________
    max_pooling2d_2 (MaxPooling2 (None, 63, 63, 32) 0
    _________________________________________________________________
    flatten_1 (Flatten) (None, 127008) 0
    _________________________________________________________________
    dense_1 (Dense) (None, 1) 127009
    =========================================Model: “sequential_1”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_1 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    conv2d_2 (Conv2D) (None, 252, 252, 32) 9248
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 126, 126, 32) 0
    _________________________________________________________________
    max_pooling2d_2 (MaxPooling2 (None, 63, 63, 32) 0
    _________________________________________________________________
    flatten_1 (Flatten) (None, 127008) 0
    _________________________________________________________________
    dense_1 (Dense) (None, 1) 127009
    =================================================================
    Total params: 136,577
    Trainable params: 136,577
    Non-trainable params: 0
    ========================
    Total params: 136,577
    Trainable params: 136,577
    Non-trainable params: 0

  18. Vasudha February 28, 2020 at 1:00 am #

    Lesson 3 : Image Classification
    —————————————–
    Doberman (33.59%)

    I downloaded 2 other images. One of a flower and the other one of a cat. Below are the results.

    1. vase (44.59%)

    2. Egyptian_cat (56.30%)

  19. Vasudha February 28, 2020 at 1:18 am #

    Lesson 05: Train Image Classification Model
    ———————————————————–

    Epoch 1/10
    – 27s – loss: 0.4170 – accuracy: 0.8525
    Epoch 2/10
    – 25s – loss: 0.2761 – accuracy: 0.8993
    Epoch 3/10
    – 27s – loss: 0.2326 – accuracy: 0.9144
    Epoch 4/10
    – 25s – loss: 0.1991 – accuracy: 0.9274
    Epoch 5/10
    – 24s – loss: 0.1747 – accuracy: 0.9350
    Epoch 6/10
    – 24s – loss: 0.1501 – accuracy: 0.9447
    Epoch 7/10
    – 24s – loss: 0.1308 – accuracy: 0.9520
    Epoch 8/10
    – 24s – loss: 0.1120 – accuracy: 0.9587
    Epoch 9/10
    – 24s – loss: 0.0982 – accuracy: 0.9636
    Epoch 10/10
    – 24s – loss: 0.0839 – accuracy: 0.9696
    0.3199548319131136 0.9136000275611877

  20. VICENTE CASTILLO GUILLÉN March 23, 2020 at 1:10 am #

    Classification for
    Architectural Design through the Eye of Artificial
    Intelligence: https://arxiv.org/ftp/arxiv/papers/1812/1812.01714.pdf

    Measuring human perceptions of a large-scale urban region using machine learning:
    https://www.researchgate.net/publication/327720319_Measuring_human_perceptions_of_a_large-scale_urban_region_using_machine_learning

    Classification of Mexican heritage buildings’ architectural styles: https://dl.acm.org/doi/abs/10.1145/3095713.3095730

    A deep convolutional network for fine-art paintings
    classification:
    http://www.cs-chan.com/doc/ICIP2016_Poster.pdf

    Architectural Style Classification of Building Facade Windows: https://link.springer.com/chapter/10.1007/978-3-642-24031-7_28

  21. Elifuraha Gerard March 24, 2020 at 7:43 am #

    Thanks Jason for the very clear instructions.
    For lesson 2 quiz I used the mumpy library as follows:

    import numpy as np

    I then used np.array() to convert the image into numpy array and employed the short cut below to standardize the image as follows:

    image = Image.open(‘bondi_beach.jpg’)
    pixels = asarray(image)
    pixels = pixels.astype(‘float32’)
    # Convert to numpy array data type
    pixels_np = np.array(pixels)
    print(‘Min: %.3f, Max: %.3f’ % (pixels_np.min(), pixels_np.max()))

    >Min: 0.000, Max: 1.000000

    #stadardize the image

    standardized_pixels_np = (pixels_np-pixels_np.mean())/pixels_np.std()

    # confirm the standardization
    print(‘Min: %.3f, Max: %.3f’ % (standardized_pixels_np.min(), standardized_pixels_np.max()))
    > Min: -3.003, Max: 1.920

    Gerard

  22. VICENTE CASTILLO GUILLÉN March 28, 2020 at 10:32 pm #

    Lesson 03: This is what I get

    Model: “sequential_1”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_1 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 127, 127, 32) 0
    _________________________________________________________________
    flatten_1 (Flatten) (None, 516128) 0
    _________________________________________________________________
    dense_1 (Dense) (None, 1) 516129
    =================================================================
    Total params: 516,449
    Trainable params: 516,449
    Non-trainable params: 0
    _________________________________________________________________

  23. VICENTE CASTILLO GUILLÉN March 28, 2020 at 11:04 pm #

    Lesson 03: Extra

    Model: “sequential_1”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_1 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    conv2d_2 (Conv2D) (None, 252, 252, 32) 9248
    _________________________________________________________________
    conv2d_3 (Conv2D) (None, 250, 250, 32) 9248
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 125, 125, 32) 0
    _________________________________________________________________
    max_pooling2d_2 (MaxPooling2 (None, 62, 62, 32) 0
    _________________________________________________________________
    max_pooling2d_3 (MaxPooling2 (None, 31, 31, 32) 0
    _________________________________________________________________
    flatten_1 (Flatten) (None, 30752) 0
    _________________________________________________________________
    dense_1 (Dense) (None, 1) 30753
    =================================================================
    Total params: 49,569
    Trainable params: 49,569
    Non-trainable params: 0
    _________________________________________________________________

  24. VICENTE CASTILLO GUILLÉN March 28, 2020 at 11:11 pm #

    Lesson 03:

    Is this correct?

    In the basic code you provide the 1st convolution uses a 3×3 kernel to transform the image from 256×256 and 1 channel, to 254×254 and 32 channels.

    The 2nd convolution transforms the image to a size of 127×127 pixels.

    The 3rd one, flatten, adds is the sum of all the parameters of the matrix.

    The “dense” convolution is the one which classifies (0 or 1).

    I have a question: What’s the meaning of the #320 param? Why it transforms into 0 and then it changes in the final convolution to 519129?

    Thanks Jason for your help!

    This link was helpful for me to understand what a keras is and how it works:
    https://www.pyimagesearch.com/2018/12/31/keras-conv2d-and-convolutional-layers/

  25. VICENTE CASTILLO GUILLÉN March 29, 2020 at 12:04 am #

    Lesson 04: Doberman (33.59%)

  26. VICENTE CASTILLO GUILLÉN March 29, 2020 at 12:05 am #

    Lesson 05:

    Epoch 1/10
    – 23s – loss: 0.3851 – accuracy: 0.8624
    Epoch 2/10
    – 24s – loss: 0.2594 – accuracy: 0.9060
    Epoch 3/10
    – 24s – loss: 0.2172 – accuracy: 0.9191
    Epoch 4/10
    – 24s – loss: 0.1847 – accuracy: 0.9325
    Epoch 5/10
    – 24s – loss: 0.1586 – accuracy: 0.9405
    Epoch 6/10
    – 25s – loss: 0.1368 – accuracy: 0.9495
    Epoch 7/10
    – 24s – loss: 0.1171 – accuracy: 0.9567
    Epoch 8/10
    – 24s – loss: 0.1029 – accuracy: 0.9619
    Epoch 9/10
    – 24s – loss: 0.0885 – accuracy: 0.9679
    Epoch 10/10
    – 25s – loss: 0.0759 – accuracy: 0.9729
    0.3183996982872486 0.9110999703407288

  27. VICENTE CASTILLO GUILLÉN March 29, 2020 at 2:49 am #

    Lesson 5 extra:

    Following the instructions included in this tutorial:

    https://machinelearningmastery.com/how-to-develop-a-cnn-from-scratch-for-fashion-mnist-clothing-classification/

    I could ran the example and I got the right class: 2.

    You need to add this line to the code provided in Lesson 5:
    # save model
    model.save(‘final_model.h5’)

    And you’ll have to save the image in the tutorial I mentioned before as: ‘sample_image.png’

    Then open and run a new file with the following code:

    # make a prediction for a new image.
    from keras.preprocessing.image import load_img
    from keras.preprocessing.image import img_to_array
    from keras.models import load_model

    # load and prepare the image
    def load_image(filename):
    # load the image
    img = load_img(filename, grayscale=True, target_size=(28, 28))
    # convert to array
    img = img_to_array(img)
    # reshape into a single sample with 1 channel
    img = img.reshape(1, 28, 28, 1)
    # prepare pixel data
    img = img.astype(‘float32’)
    img = img / 255.0
    return img

    # load an image and predict the class
    def run_example():
    # load the image
    img = load_image(‘sample_image.png’)
    # load model
    model = load_model(‘final_model.h5’)
    # predict the class
    result = model.predict_classes(img)
    print(result[0])

    # entry point, run the example
    run_example()

    Thanks Jason!!

  28. VICENTE CASTILLO GUILLÉN March 29, 2020 at 3:06 am #

    Lesson 6:

    After running the code, we see 9 images similar to the original one, but with several changes:
    – It has been rotated, flipped (horizontal and vertically), the background seems to have changed the direction (rotation) of the coloured areas, some areas in the perimeter have been filled with colours similar to the ones in connection to the original picture but with a kind of a “motion blur”.

    • Jason Brownlee March 29, 2020 at 6:03 am #

      Yes, different augmentations each run of the code.

Leave a Reply