How to Get Started With Deep Learning for Computer Vision (7-Day Mini-Course)

Deep Learning for Computer Vision Crash Course.
Bring Deep Learning Methods to Your Computer Vision Project in 7 Days.

We are awash in digital images from photos, videos, Instagram, YouTube, and increasingly live video streams.

Working with image data is hard as it requires drawing upon knowledge from diverse domains such as digital signal processing, machine learning, statistical methods, and these days, deep learning.

Deep learning methods are out-competing the classical and statistical methods on some challenging computer vision problems with singular and simpler models.

In this crash course, you will discover how you can get started and confidently develop deep learning for computer vision problems using Python in seven days.

Note: This is a big and important post. You might want to bookmark it.

Let’s get started.

  • Update Nov/2019: Updated for TensorFlow v2.0 and MTCNN v0.1.0.
How to Get Started With Deep Learning for Computer Vision (7-Day Mini-Course)

How to Get Started With Deep Learning for Computer Vision (7-Day Mini-Course)
Photo by oliver.dodd, some rights reserved.

Who Is This Crash-Course For?

Before we get started, let’s make sure you are in the right place.

The list below provides some general guidelines as to who this course was designed for.

Don’t panic if you don’t match these points exactly; you might just need to brush up in one area or another to keep up.

You need to know:

  • You need to know your way around basic Python, NumPy, and Keras for deep learning.

You do NOT need to be:

  • You do not need to be a math wiz!
  • You do not need to be a deep learning expert!
  • You do not need to be a computer vision researcher!

This crash course will take you from a developer that knows a little machine learning to a developer who can bring deep learning methods to your own computer vision project.

Note: This crash course assumes you have a working Python 2 or 3 SciPy environment with at least NumPy, Pandas, scikit-learn, and Keras 2 installed. If you need help with your environment, you can follow the step-by-step tutorial here:

Crash-Course Overview

This crash course is broken down into seven lessons.

You could complete one lesson per day (recommended) or complete all of the lessons in one day (hardcore). It really depends on the time you have available and your level of enthusiasm.

Below are the seven lessons that will get you started and productive with deep learning for computer vision in Python:

  • Lesson 01: Deep Learning and Computer Vision
  • Lesson 02: Preparing Image Data
  • Lesson 03: Convolutional Neural Networks
  • Lesson 04: Image Classification
  • Lesson 05: Train Image Classification Model
  • Lesson 06: Image Augmentation
  • Lesson 07: Face Detection

Each lesson could take you anywhere from 60 seconds up to 30 minutes. Take your time and complete the lessons at your own pace. Ask questions and even post results in the comments below.

The lessons might expect you to go off and find out how to do things. I will give you hints, but part of the point of each lesson is to force you to learn where to go to look for help on and about the deep learning, computer vision, and the best-of-breed tools in Python (hint: I have all of the answers on this blog, just use the search box).

Post your results in the comments; I’ll cheer you on!

Hang in there; don’t give up.

Note: This is just a crash course. For a lot more detail and fleshed out tutorials, see my book on the topic titled “Deep Learning for Computer Vision.”

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Lesson 01: Deep Learning and Computer Vision

In this lesson, you will discover the promise of deep learning methods for computer vision.

Computer Vision

Computer Vision, or CV for short, is broadly defined as helping computers to “see” or extract meaning from digital images such as photographs and videos.

Researchers have been working on the problem of helping computers see for more than 50 years, and some great successes have been achieved, such as the face detection available in modern cameras and smartphones.

The problem of understanding images is not solved, and may never be. This is primarily because the world is complex and messy. There are few rules. And yet we can easily and effortlessly recognize objects, people, and context.

Deep Learning

Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.

A property of deep learning is that the performance of this type of model improves by training it with more examples and by increasing its depth or representational capacity.

In addition to scalability, another often-cited benefit of deep learning models is their ability to perform automatic feature extraction from raw data, also called feature learning.

Promise of Deep Learning for Computer vision

Deep learning methods are popular for computer vision, primarily because they are delivering on their promise.

Some of the first large demonstrations of the power of deep learning were in computer vision, specifically image classification. More recently in object detection and face recognition.

The three key promises of deep learning for computer vision are as follows:

  • The Promise of Feature Learning. That is, that deep learning methods can automatically learn the features from image data required by the model, rather than requiring that the feature detectors be handcrafted and specified by an expert.
  • The Promise of Continued Improvement. That is, that the performance of deep learning in computer vision is based on real results and that the improvements appear to be continuing and perhaps speeding up.
  • The Promise of End-to-End Models. That is, that large end-to-end deep learning models can be fit on large datasets of images or video offering a more general and better-performing approach.

Computer vision is not “solved” but deep learning is required to get you to the state-of-the-art on many challenging problems in the field.

Your Task

For this lesson, you must research and list five impressive applications of deep learning methods in the field of computer vision. Bonus points if you can link to a research paper that demonstrates the example.

Post your answer in the comments below. I would love to see what you discover.

In the next lesson, you will discover how to prepare image data for modeling.

Lesson 02: Preparing Image Data

In this lesson, you will discover how to prepare image data for modeling.

Images are comprised of matrices of pixel values.

Pixel values are often unsigned integers in the range between 0 and 255. Although these pixel values can be presented directly to neural network models in their raw format, this can result in challenges during modeling, such as slower than expected training of the model.

Instead, there can be great benefit in preparing the image pixel values prior to modeling, such as simply scaling pixel values to the range 0-1 to centering and even standardizing the values.

This is called normalization and can be performed directly on a loaded image. The example below uses the PIL library (the standard image handling library in Python) to load an image and normalize its pixel values.

First, confirm that you have the Pillow library installed; it is installed with most SciPy environments, but you can learn more here:

Next, download a photograph of Bondi Beach in Sydney Australia, taken by Isabell Schulz and released under a permissive license. Save the image in your current working directory with the filename ‘bondi_beach.jpg‘.

Next, we can use the Pillow library to load the photo, confirm the min and max pixel values, normalize the values, and confirm the normalization was performed.

Your Task

Your task in this lesson is to run the example code on the provided photograph and report the min and max pixel values before and after the normalization.

For bonus points, you can update the example to standardize the pixel values.

Post your findings in the comments below. I would love to see what you discover.

In the next lesson, you will discover information about convolutional neural network models.

Lesson 03: Convolutional Neural Networks

In this lesson, you will discover how to construct a convolutional neural network using a convolutional layer, pooling layer, and fully connected output layer.

Convolutional Layers

A convolution is the simple application of a filter to an input that results in an activation. Repeated application of the same filter to an input results in a map of activations called a feature map, indicating the locations and strength of a detected feature in an input, such as an image.

A convolutional layer can be created by specifying both the number of filters to learn and the fixed size of each filter, often called the kernel shape.

Pooling Layers

Pooling layers provide an approach to downsampling feature maps by summarizing the presence of features in patches of the feature map.

Maximum pooling, or max pooling, is a pooling operation that calculates the maximum, or largest, value in each patch of each feature map.

Classifier Layer

Once the features have been extracted, they can be interpreted and used to make a prediction, such as classifying the type of object in a photograph.

This can be achieved by first flattening the two-dimensional feature maps, and then adding a fully connected output layer. For a binary classification problem, the output layer would have one node that would predict a value between 0 and 1 for the two classes.

Convolutional Neural Network

The example below creates a convolutional neural network that expects grayscale images with the square size of 256×256 pixels, with one convolutional layer with 32 filters, each with the size of 3×3 pixels, a max pooling layer, and a binary classification output layer.

Your Task

Your task in this lesson is to run the example and describe how the shape of an input image would be changed by the convolutional and pooling layers.

For extra points, you could try adding more convolutional or pooling layers and describe the effect it has on the image as it flows through the model.

Post your findings in the comments below. I would love to see what you discover.

In the next lesson, you will learn how to use a deep convolutional neural network to classify photographs of objects.

Lesson 04: Image Classification

In this lesson, you will discover how to use a pre-trained model to classify photographs of objects.

Deep convolutional neural network models may take days, or even weeks, to train on very large datasets.

A way to short-cut this process is to re-use the model weights from pre-trained models that were developed for standard computer vision benchmark datasets, such as the ImageNet image recognition tasks.

The example below uses the VGG-16 pre-trained model to classify photographs of objects into one of 1,000 known classes.

Download this photograph of a dog taken by Justin Morgan and released under a permissive license. Save it in your current working directory with the filename ‘dog.jpg‘.

The example below will load the photograph and output a prediction, classifying the object in the photograph.

Note: The first time you run the example, the pre-trained model will have to be downloaded, which is a few hundred megabytes and make take a few minutes based on the speed of your internet connection.

Your Task

Your task in this lesson is to run the example and report the result.

For bonus points, try running the example on another photograph of a common object.

Post your findings in the comments below. I would love to see what you discover.

In the next lesson, you will discover how to fit and evaluate a model for image classification.

Lesson 05: Train Image Classification Model

In this lesson, you will discover how to train and evaluate a convolutional neural network for image classification.

The Fashion-MNIST clothing classification problem is a new standard dataset used in computer vision and deep learning.

It is a dataset comprised of 60,000 small square 28×28 pixel grayscale images of items of 10 types of clothing, such as shoes, t-shirts, dresses, and more.

The example below loads the dataset, scales the pixel values, then fits a convolutional neural network on the training dataset and evaluates the performance of the network on the test dataset.

The example will run in just a few minutes on a modern CPU; no GPU is required.

Your Task

Your task in this lesson is to run the example and report the performance of the model on the test dataset.

For bonus points, try varying the configuration of the model, or try saving the model and later loading it and using it to make a prediction on new grayscale photographs of clothing.

Post your findings in the comments below. I would love to see what you discover.

In the next lesson, you will discover how to use image augmentation on training data.

Lesson 06: Image Augmentation

In this lesson, you will discover how to use image augmentation.

Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset.

Training deep learning neural network models on more data can result in more skillful models, and the augmentation techniques can create variations of the images that can improve the ability of the fit models to generalize what they have learned to new images.

The Keras deep learning neural network library provides the capability to fit models using image data augmentation via the ImageDataGenerator class.

Download a photograph of a bird by AndYaDontStop, released under a permissive license. Save it into your current working directory with the name ‘bird.jpg‘.

The example below will load the photograph as a dataset and use image augmentation to create flipped and rotated versions of the image that can be used to train a convolutional neural network model.

Your Task

Your task in this lesson is to run the example and report the effect that the image augmentation has had on the original image.

For bonus points, try additional types of image augmentation, supported by the ImageDataGenerator class.

Post your findings in the comments below. I would love to see what you find.

In the next lesson, you will discover how to use a deep convolutional network to detect faces in photographs.

Lesson 07: Face Detection

In this lesson, you will discover how to use a convolutional neural network for face detection.

Face detection is a trivial problem for humans to solve and has been solved reasonably well by classical feature-based techniques, such as the cascade classifier.

More recently, deep learning methods have achieved state-of-the-art results on standard face detection datasets. One example is the Multi-task Cascade Convolutional Neural Network, or MTCNN for short.

The ipazc/MTCNN project provides an open source implementation of the MTCNN that can be installed easily as follows:

Download a photograph of a person on the street taken by Holland and released under a permissive license. Save it into your current working directory with the name ‘street.jpg‘.

The example below will load the photograph and use the MTCNN model to detect faces and will plot the photo and draw a box around the first detected face.

Your Task

Your task in this lesson is to run the example and describe the result.

For bonus points, try the model on another photograph with multiple faces and update the code example to draw a box around each detected face.

Post your findings in the comments below. I would love to see what you discover.

The End!
(Look How Far You Have Come)

You made it. Well done!

Take a moment and look back at how far you have come.

You discovered:

  • What computer vision is and the promise and impact that deep learning is having on the field.
  • How to scale the pixel values of image data in order to make them ready for modeling.
  • How to develop a convolutional neural network model from scratch.
  • How to use a pre-trained model to classify photographs of objects.
  • How to train a model from scratch to classify photographs of clothing.
  • How to use image augmentation to create modified copies of photographs in your training dataset.
  • How to use a pre-trained deep learning model to detect people’s faces in photographs.

This is just the beginning of your journey with deep learning for computer vision. Keep practicing and developing your skills.

Take the next step and check out my book on deep learning for computer vision.

Summary

How Did You Do With The Mini-Course?
Did you enjoy this crash course?

Do you have any questions? Were there any sticking points?
Let me know. Leave a comment below.

Develop Deep Learning Models for Vision Today!

Deep Learning for Computer Vision

Develop Your Own Vision Models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Computer Vision

It provides self-study tutorials on topics like:
classification, object detection (yolo and rcnn), face recognition (vggface and facenet), data preparation and much more...

Finally Bring Deep Learning to your Vision Projects

Skip the Academics. Just Results.

See What's Inside

299 Responses to How to Get Started With Deep Learning for Computer Vision (7-Day Mini-Course)

  1. Avatar
    Abid Rizvi April 11, 2019 at 1:53 pm #

    Lesson 02: Preparing Image Data
    ================================

    Before Normalization:

    Data Type: uint8
    Min: 0.000, Max: 255.000
    Min: 0.000, Max: 1.000

    After Normalization:
    Data Type: uint8
    Min: 0.000, Max: 255.000
    Min: 0.000, Max: 1.000

    • Avatar
      Jason Brownlee April 11, 2019 at 2:21 pm #

      Well done.

    • Avatar
      Claudio Lombardi September 25, 2020 at 11:41 am #

      After Normalization
      Data Type: uint8
      Min: 0.000, Max: 255.000
      Min: 0.000, Max: 1.000
      Mean: 0.610, Std: 0.203

  2. Avatar
    Abid Rizvi April 11, 2019 at 1:57 pm #

    Lesson 02: Preparing Image Data
    ================================

    Before Normalization:
    Min: 0.000, Max: 255.000

    After Normalization:
    Min: 0.000, Max: 1.000

    Max value 255 converts to 1.000

  3. Avatar
    Abid Rizvi April 11, 2019 at 1:59 pm #

    Lesson 02: Preparing Image Data
    ===============================
    For bonus points, you can update the example to standardize the pixel values.

    What do you mean by standardize the pixel values? Please elaborate.

  4. Avatar
    Abid Rizvi April 11, 2019 at 2:18 pm #

    Lesson 03: Convolutional Neural Networks
    =========================================

    input_shape=(256, 256, 1)

    Convolutional Layer 1 (filter size 3×3)
    ————————————–
    model.add(Conv2D(32, (3,3), input_shape=(256, 256, 1)))

    Output shape: (None, 254, 254, 32)
    Max Pooling: (None, 127, 127, 32)

    Convolutional Layer 2 (filter size 3×3)
    ————————————–
    model.add(Conv2D(32, (3,3)))

    Output shape: (None, 252, 252, 32)
    Max Pooling: (None, 126, 126, 32)

    Convolutional Layer 3 (filter size 7×7)
    ————————————–
    model.add(Conv2D(32, (7,7)))

    Output shape: (None, 246, 246, 32)
    Max Pooling: (None, 125, 125, 32)

  5. Avatar
    Abid Rizvi April 11, 2019 at 5:01 pm #

    Lesson 04: Lesson 04: Image Classification
    =========================================
    Doberman (33.59%)

    Lesson 04: Lesson 04: Image Classification
    =========================================
    Doberman (33.59%)

    some other image (I downloaded two image one for a dog and another for human)

    Dog result:
    German_shepherd (87.66%)

    Human result:
    swimming_trunks (15.77%)

    • Avatar
      Jason Brownlee April 12, 2019 at 7:39 am #

      Well done!

    • Avatar
      sari June 13, 2020 at 4:01 pm #

      lesson 4:

      result Doberman (33.59%)

      i have given other dog image i got result as Labrador retriever.i didn’t get any output when i gave human image

  6. Avatar
    Abid Rizvi April 12, 2019 at 4:39 pm #

    Lesson 05: Train Image Classification Model
    ============================================
    (Yesterday after loading running the example in first go)
    Loss: 0.318009318998456
    Accuracy: 0.912

    (Today after loading loading model from saved model/weights)
    Loss: 0.30647955482006073
    Accuracy: 0.9113

    Why there is some light changes in the third decimals of Loss & Accuracy?

  7. Avatar
    Turyamusiima Dismas April 13, 2019 at 2:49 am #

    before normalisation

    Min: 0.000, Max: 255.000

    after normalisation

    Min: 0.000, Max: 1.000

  8. Avatar
    Jamuna Prakash April 13, 2019 at 9:59 am #

    Lesson#03’s output:

    Data Type: uint8
    Min: 0.000, Max: 255.000
    Min: 0.000, Max: 1.000

  9. Avatar
    Nitin May 6, 2019 at 6:55 pm #

    Lesson 02: Preparing Image Data
    ================================
    Data Type: uint8
    Min: 0.000, Max: 255.000
    Min: 0.000, Max: 1.000

  10. Avatar
    jk (jayakumar) May 25, 2019 at 8:57 pm #

    import tensorflow as tf
    print(tf.__version__)

    2.0.0-alpha0
    In this version of tensorflow, and lesson 3 code

    from keras.models import Sequential
    model = Sequential()

    lead to following Error.
    AttributeError: module ‘tensorflow’ has no attribute ‘get_default_graph

    Then there is need to change
    #from keras.models import Sequential

    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Conv2D
    from tensorflow.keras.layers import MaxPooling2D
    from tensorflow.keras.layers import Flatten
    from tensorflow.keras.layers import Dense

    • Avatar
      Jason Brownlee May 26, 2019 at 6:44 am #

      I recommend using the Keras library directly, not the keras interface in tensorflow.

  11. Avatar
    Marcello October 15, 2019 at 4:12 am #

    Image Classification https://arxiv.org/abs/1512.03385
    Image Classification With Localization https://arxiv.org/abs/1311.2524
    Object Detection https://arxiv.org/abs/1506.02640
    Object Segmentation https://ieeexplore.ieee.org/document/7803544
    Image Style Transfer https://ieeexplore.ieee.org/document/7780634
    Image Colorization
    Image Reconstruction
    Image Super-Resolution
    Image Synthesis

  12. Avatar
    Saurabh December 12, 2019 at 11:11 pm #

    Hello Jason,

    Thanks for sharing mini course.

    I am trying to run MTCNN on tensorflow 2.0 and throws error: module ‘tensorflow’ has no attribute ‘get_default_graph’

    I cross verified my opencv-python version i.e. 4.1.2 and MTCNN version 0.1.0.

    Could you please guide me?

    Thanking you,
    Saurabh

    • Avatar
      Jason Brownlee December 13, 2019 at 6:03 am #

      You must use TF1.15 or TF1.14 with Mask RCNN.

      • Avatar
        Saurabh December 13, 2019 at 7:51 pm #

        Thank you! It means MTCNN is not supported by TF2.0? Right?

        • Avatar
          Jason Brownlee December 14, 2019 at 6:15 am #

          Yes, I recommend TF2. MTCNN uses TF2.

          • Avatar
            Saurabh December 16, 2019 at 8:00 pm #

            Thank you!

  13. Avatar
    sara ahmed January 20, 2020 at 9:40 pm #

    1- MNIST dataset.
    2- detecting Alzheimer’s disease using CNN
    3- image segmentation using semantic segmentation
    4- image classification using 3D-CNN and autoencoder

  14. Avatar
    Vasudha February 24, 2020 at 11:52 pm #

    Lesson 1: Deep Learning and computer vision
    ______________________________________

    1. Object Detection
    (W. Ouyang et al., “DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 7, pp. 1320-1334, 1 July 2017.)

    2. Face detection and recognition
    (https://www.researchgate.net/publication/255653401)

    3. Action and Activity recognition (http://yann.lecun.com/exdb/publis/pdf/lecun-90c.pdf)

    4. Human Pose estimation ( 3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information – https://link.springer.com/chapter/10.1007/978-3-319-49409-8_15)

    5. Datasets / Images (https://www.researchgate.net/publication/275257620_Image_Classification_Using_Convolutional_Neural_Networks)

  15. Avatar
    Vasudha February 27, 2020 at 10:39 pm #

    Lesson 02: Preparing Image Data
    ———————————————–

    Data Type: uint8
    Min: 0.000, Max: 255.000
    Min: 0.000, Max: 1.000

  16. Avatar
    Vasudha February 27, 2020 at 10:47 pm #

    Lesson 3 : Convolutional Neural Networks
    ———————————————————-
    Model: “sequential_1”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_1 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 127, 127, 32) 0
    _________________________________________________________________
    flatten_1 (Flatten) (None, 516128) 0
    _________________________________________________________________
    dense_1 (Dense) (None, 1) 516129
    =================================================================
    Total params: 516,449
    Trainable params: 516,449
    Non-trainable params: 0

  17. Avatar
    Vasudha February 28, 2020 at 12:17 am #

    Lesson 3: Convolutional neural networks / For extra points
    ———————————————————–
    Model: “sequential_1”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_1 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    conv2d_2 (Conv2D) (None, 252, 252, 32) 9248
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 126, 126, 32) 0
    _________________________________________________________________
    max_pooling2d_2 (MaxPooling2 (None, 63, 63, 32) 0
    _________________________________________________________________
    flatten_1 (Flatten) (None, 127008) 0
    _________________________________________________________________
    dense_1 (Dense) (None, 1) 127009
    =========================================Model: “sequential_1”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_1 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    conv2d_2 (Conv2D) (None, 252, 252, 32) 9248
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 126, 126, 32) 0
    _________________________________________________________________
    max_pooling2d_2 (MaxPooling2 (None, 63, 63, 32) 0
    _________________________________________________________________
    flatten_1 (Flatten) (None, 127008) 0
    _________________________________________________________________
    dense_1 (Dense) (None, 1) 127009
    =================================================================
    Total params: 136,577
    Trainable params: 136,577
    Non-trainable params: 0
    ========================
    Total params: 136,577
    Trainable params: 136,577
    Non-trainable params: 0

  18. Avatar
    Vasudha February 28, 2020 at 1:00 am #

    Lesson 3 : Image Classification
    —————————————–
    Doberman (33.59%)

    I downloaded 2 other images. One of a flower and the other one of a cat. Below are the results.

    1. vase (44.59%)

    2. Egyptian_cat (56.30%)

  19. Avatar
    Vasudha February 28, 2020 at 1:18 am #

    Lesson 05: Train Image Classification Model
    ———————————————————–

    Epoch 1/10
    – 27s – loss: 0.4170 – accuracy: 0.8525
    Epoch 2/10
    – 25s – loss: 0.2761 – accuracy: 0.8993
    Epoch 3/10
    – 27s – loss: 0.2326 – accuracy: 0.9144
    Epoch 4/10
    – 25s – loss: 0.1991 – accuracy: 0.9274
    Epoch 5/10
    – 24s – loss: 0.1747 – accuracy: 0.9350
    Epoch 6/10
    – 24s – loss: 0.1501 – accuracy: 0.9447
    Epoch 7/10
    – 24s – loss: 0.1308 – accuracy: 0.9520
    Epoch 8/10
    – 24s – loss: 0.1120 – accuracy: 0.9587
    Epoch 9/10
    – 24s – loss: 0.0982 – accuracy: 0.9636
    Epoch 10/10
    – 24s – loss: 0.0839 – accuracy: 0.9696
    0.3199548319131136 0.9136000275611877

  20. Avatar
    VICENTE CASTILLO GUILLÉN March 23, 2020 at 1:10 am #

    Classification for
    Architectural Design through the Eye of Artificial
    Intelligence: https://arxiv.org/ftp/arxiv/papers/1812/1812.01714.pdf

    Measuring human perceptions of a large-scale urban region using machine learning:
    https://www.researchgate.net/publication/327720319_Measuring_human_perceptions_of_a_large-scale_urban_region_using_machine_learning

    Classification of Mexican heritage buildings’ architectural styles: https://dl.acm.org/doi/abs/10.1145/3095713.3095730

    A deep convolutional network for fine-art paintings
    classification:
    http://www.cs-chan.com/doc/ICIP2016_Poster.pdf

    Architectural Style Classification of Building Facade Windows: https://link.springer.com/chapter/10.1007/978-3-642-24031-7_28

  21. Avatar
    Elifuraha Gerard March 24, 2020 at 7:43 am #

    Thanks Jason for the very clear instructions.
    For lesson 2 quiz I used the mumpy library as follows:

    import numpy as np

    I then used np.array() to convert the image into numpy array and employed the short cut below to standardize the image as follows:

    image = Image.open(‘bondi_beach.jpg’)
    pixels = asarray(image)
    pixels = pixels.astype(‘float32’)
    # Convert to numpy array data type
    pixels_np = np.array(pixels)
    print(‘Min: %.3f, Max: %.3f’ % (pixels_np.min(), pixels_np.max()))

    >Min: 0.000, Max: 1.000000

    #stadardize the image

    standardized_pixels_np = (pixels_np-pixels_np.mean())/pixels_np.std()

    # confirm the standardization
    print(‘Min: %.3f, Max: %.3f’ % (standardized_pixels_np.min(), standardized_pixels_np.max()))
    > Min: -3.003, Max: 1.920

    Gerard

  22. Avatar
    VICENTE CASTILLO GUILLÉN March 28, 2020 at 10:32 pm #

    Lesson 03: This is what I get

    Model: “sequential_1”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_1 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 127, 127, 32) 0
    _________________________________________________________________
    flatten_1 (Flatten) (None, 516128) 0
    _________________________________________________________________
    dense_1 (Dense) (None, 1) 516129
    =================================================================
    Total params: 516,449
    Trainable params: 516,449
    Non-trainable params: 0
    _________________________________________________________________

  23. Avatar
    VICENTE CASTILLO GUILLÉN March 28, 2020 at 11:04 pm #

    Lesson 03: Extra

    Model: “sequential_1”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_1 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    conv2d_2 (Conv2D) (None, 252, 252, 32) 9248
    _________________________________________________________________
    conv2d_3 (Conv2D) (None, 250, 250, 32) 9248
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 125, 125, 32) 0
    _________________________________________________________________
    max_pooling2d_2 (MaxPooling2 (None, 62, 62, 32) 0
    _________________________________________________________________
    max_pooling2d_3 (MaxPooling2 (None, 31, 31, 32) 0
    _________________________________________________________________
    flatten_1 (Flatten) (None, 30752) 0
    _________________________________________________________________
    dense_1 (Dense) (None, 1) 30753
    =================================================================
    Total params: 49,569
    Trainable params: 49,569
    Non-trainable params: 0
    _________________________________________________________________

  24. Avatar
    VICENTE CASTILLO GUILLÉN March 28, 2020 at 11:11 pm #

    Lesson 03:

    Is this correct?

    In the basic code you provide the 1st convolution uses a 3×3 kernel to transform the image from 256×256 and 1 channel, to 254×254 and 32 channels.

    The 2nd convolution transforms the image to a size of 127×127 pixels.

    The 3rd one, flatten, adds is the sum of all the parameters of the matrix.

    The “dense” convolution is the one which classifies (0 or 1).

    I have a question: What’s the meaning of the #320 param? Why it transforms into 0 and then it changes in the final convolution to 519129?

    Thanks Jason for your help!

    This link was helpful for me to understand what a keras is and how it works:
    https://www.pyimagesearch.com/2018/12/31/keras-conv2d-and-convolutional-layers/

  25. Avatar
    VICENTE CASTILLO GUILLÉN March 29, 2020 at 12:04 am #

    Lesson 04: Doberman (33.59%)

  26. Avatar
    VICENTE CASTILLO GUILLÉN March 29, 2020 at 12:05 am #

    Lesson 05:

    Epoch 1/10
    – 23s – loss: 0.3851 – accuracy: 0.8624
    Epoch 2/10
    – 24s – loss: 0.2594 – accuracy: 0.9060
    Epoch 3/10
    – 24s – loss: 0.2172 – accuracy: 0.9191
    Epoch 4/10
    – 24s – loss: 0.1847 – accuracy: 0.9325
    Epoch 5/10
    – 24s – loss: 0.1586 – accuracy: 0.9405
    Epoch 6/10
    – 25s – loss: 0.1368 – accuracy: 0.9495
    Epoch 7/10
    – 24s – loss: 0.1171 – accuracy: 0.9567
    Epoch 8/10
    – 24s – loss: 0.1029 – accuracy: 0.9619
    Epoch 9/10
    – 24s – loss: 0.0885 – accuracy: 0.9679
    Epoch 10/10
    – 25s – loss: 0.0759 – accuracy: 0.9729
    0.3183996982872486 0.9110999703407288

  27. Avatar
    VICENTE CASTILLO GUILLÉN March 29, 2020 at 2:49 am #

    Lesson 5 extra:

    Following the instructions included in this tutorial:

    https://machinelearningmastery.com/how-to-develop-a-cnn-from-scratch-for-fashion-mnist-clothing-classification/

    I could ran the example and I got the right class: 2.

    You need to add this line to the code provided in Lesson 5:
    # save model
    model.save(‘final_model.h5’)

    And you’ll have to save the image in the tutorial I mentioned before as: ‘sample_image.png’

    Then open and run a new file with the following code:

    # make a prediction for a new image.
    from keras.preprocessing.image import load_img
    from keras.preprocessing.image import img_to_array
    from keras.models import load_model

    # load and prepare the image
    def load_image(filename):
    # load the image
    img = load_img(filename, grayscale=True, target_size=(28, 28))
    # convert to array
    img = img_to_array(img)
    # reshape into a single sample with 1 channel
    img = img.reshape(1, 28, 28, 1)
    # prepare pixel data
    img = img.astype(‘float32’)
    img = img / 255.0
    return img

    # load an image and predict the class
    def run_example():
    # load the image
    img = load_image(‘sample_image.png’)
    # load model
    model = load_model(‘final_model.h5’)
    # predict the class
    result = model.predict_classes(img)
    print(result[0])

    # entry point, run the example
    run_example()

    Thanks Jason!!

  28. Avatar
    VICENTE CASTILLO GUILLÉN March 29, 2020 at 3:06 am #

    Lesson 6:

    After running the code, we see 9 images similar to the original one, but with several changes:
    – It has been rotated, flipped (horizontal and vertically), the background seems to have changed the direction (rotation) of the coloured areas, some areas in the perimeter have been filled with colours similar to the ones in connection to the original picture but with a kind of a “motion blur”.

    • Avatar
      Jason Brownlee March 29, 2020 at 6:03 am #

      Yes, different augmentations each run of the code.

  29. Avatar
    VICENTE CASTILLO GUILLÉN March 29, 2020 at 8:42 pm #

    Lesson 07: I got some trouble with the cv installation, but I could solve it via this link:

    https://programarfacil.com/blog/vision-artificial/instalar-opencv-python-anaconda/

  30. Avatar
    VICENTE CASTILLO GUILLÉN March 29, 2020 at 8:43 pm #

    Lesson 07 extra:

    This is the code;

    # face detection with mtcnn on a photograph
    from matplotlib import pyplot
    from matplotlib.patches import Rectangle
    from mtcnn.mtcnn import MTCNN
    # load image from file
    pixels = pyplot.imread(‘prueba.jpg’)
    # create the detector, using default weights
    detector = MTCNN()

    # detect faces in the image
    faces = detector.detect_faces(pixels)
    # plot the image
    pyplot.imshow(pixels)
    # get the context for drawing boxes
    ax = pyplot.gca()
    for i in range(len(faces)):
    # get coordinates from the i face
    x, y, width, height = faces[i][‘box’]
    # create the shape
    rect = Rectangle((x, y), width, height, fill=False, color=’red’)
    # draw the box
    ax.add_patch(rect)
    # show the plot
    pyplot.show()

  31. Avatar
    Anubrata April 15, 2020 at 5:11 am #

    Hi,
    Thanks for this course. I am a molecular biologist interested in Data Science and so all my examples are from biology !
    1) using Denoising Autoencoders to detect Breast Cancer from gene expression data
    Tan J, Ung M, Cheng C, Greene CS. Pac Symp Biocomput. 2015;20:132–143.

    2) predicting protein structure without sequence info using multilayer residual neural network
    Wang S, Sun S, Li Z, Zhang R, Xu J (2017) PLoS Comput Biol 13(1): e1005324.

    3) Predicting drug-target interactions using restricted Boltzmann machines
    Wang Y, Zeng J. 2013;29(13):i126–i134.

    4) Deep learning based tissue analysis predicts outcome in colorectal cancer
    Bychkov D, Linder N, Turkki R, et al. Sci Rep. 2018;8(1):3395.

    5)Deep learning-based cancer survival prognosis from RNA-seq data
    Huang Z, Johnson TS, Han Z, et al. BMC Med Genomics. 2020;13(Suppl 5):41.

  32. Avatar
    Saúl Alquicira April 15, 2020 at 4:49 pm #

    Lesson 1.- Deep Learning and Computer Visión

    1.-      Autonomous vehic
    https://arxiv.org/pdf/2001.10789.pdf

    2.-     Autonomous
    http://www.robots.ox.ac.uk/~mobile/Papers/ICRA19_chadwick.pdf

    3.-Affective Computing
    https://arxiv.org/pdf/1907.09929.pdf

    4.-Improve Learning
    https://www.media.mit.edu/publications/designing-neural-network-architectures-using-reinforcement-learning/

    5.-healthhttps://dam-prod.media.mit.edu/x/2019/01/17/RudovicEtAl18-PML-Science.pdf

  33. Avatar
    anubrata das April 16, 2020 at 2:12 am #

    # prepare image data

    Hi Jason,

    i used your tutorial to standardize the data

    Bondi Beach.jpg

    format JPEG, mode RGB
    Data Type: uint8
    Min_pixel: 0.000, Max_pixel: 255.000
    Min_normal_pixel: 0.000, Max_normal_pixel: 1.000
    mean per channel =[0.51480323 0.60049796 0.7137792 ]
    std.dev per channel=[0.22923872 0.15852204 0.16162618]
    mean per channel stdz =[0.00101891 0.00115143 0.00204117]
    std.dev per channel stdz =[1.0000744 1.0001055 0.9998136]

    img2.jpg

    format JPEG, mode RGB
    Data Type: uint8
    Min_pixel: 0.000, Max_pixel: 255.000
    Min_normal_pixel: 0.000, Max_normal_pixel: 1.000
    mean per channel =[0.5421985 0.49465644 0.48014694]
    std.dev per channel=[0.21577911 0.21520971 0.24537063]
    mean per channel stdz =[-0.03672315 0.06999601 0.0575752 ]
    std.dev per channel stdz =[1.0012016 0.9973781 0.9994226]

    img4.jpg

    format JPEG, mode RGB
    Data Type: uint8
    Min_pixel: 0.000, Max_pixel: 255.000
    Min_normal_pixel: 0.000, Max_normal_pixel: 1.000
    mean per channel =[0.34658098 0.23875412 0.16520199]
    std.dev per channel=[0.26760063 0.21685596 0.17311402]
    mean per channel stdz =[0.03354708 0.04420278 0.03249924]
    std.dev per channel stdz =[0.9971665 0.9965377 1.0025591

    • Avatar
      Jason Brownlee April 16, 2020 at 6:04 am #

      Well done!

    • Avatar
      samith April 23, 2020 at 5:08 am #

      how did to get the mean and std.dev

  34. Avatar
    Saul Alquicira April 19, 2020 at 2:30 pm #

    Lesson 2 .-Preparing Image Data

    #standarize with (x – x.mean()) / x.std() # values from ? to ?, but mean at 0
    pixels = (pixels – pixels.mean()) / pixels.std()
    print (‘NORMAL Min: %.3f, Max: %.3f’ % (pixels.min(), pixels.max()))

    BEFORE Min: 0.000, Max: 255.000
    AFTER Min: 0.000, Max: 1.000
    NORMAL Min: -3.003, Max: 1.920

  35. Avatar
    Saul Alquicira April 20, 2020 at 4:16 pm #

    Lesson 3.- Convolutional Neural Networks

    Model: “sequential_24”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_33 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    conv2d_34 (Conv2D) (None, 252, 252, 32) 9248
    _________________________________________________________________
    max_pooling2d_27 (MaxPooling (None, 126, 126, 32) 0
    _________________________________________________________________
    flatten_23 (Flatten) (None, 508032) 0
    _________________________________________________________________
    dense_23 (Dense) (None, 1) 508033
    =================================================================
    Total params: 517,601
    Trainable params: 517,601
    Non-trainable params: 0

    When you add an extra pooling the params are reduced drastically and I belive because at the end is the reduction of the rectified feature map and also the reduction in the pooled feature map, at the end is a reduction in the arrays.

    When you add an extra convultional or clasificator no reduce in the same way as the pooling the total params or trainable params.

    Model: “sequential_25”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_35 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    conv2d_36 (Conv2D) (None, 252, 252, 32) 9248
    _________________________________________________________________
    max_pooling2d_28 (MaxPooling (None, 126, 126, 32) 0
    _________________________________________________________________
    max_pooling2d_29 (MaxPooling (None, 63, 63, 32) 0
    _________________________________________________________________
    flatten_24 (Flatten) (None, 127008) 0
    _________________________________________________________________
    dense_24 (Dense) (None, 1) 127009
    =================================================================
    Total params: 136,577
    Trainable params: 136,577
    Non-trainable params: 0

    I am in process to undersand more about it, I will go with your other tutorial..

    https://machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-networks/

    BR,

  36. Avatar
    Saul Alquicira April 22, 2020 at 5:35 am #

    Lesson 4 .- Image Classification

    Doberman (33.59%)
    dingo (39.17%)
    Mexican_hairless (29.25%)

  37. Avatar
    Saul Alquicira April 24, 2020 at 2:55 am #

    Lesson 5.- Image Classiffication

    Epoch 1/10
    – 21s – loss: 0.3770 – accuracy: 0.8651
    Epoch 2/10
    – 25s – loss: 0.2539 – accuracy: 0.9082
    Epoch 3/10
    – 25s – loss: 0.2099 – accuracy: 0.9232
    Epoch 4/10
    – 24s – loss: 0.1782 – accuracy: 0.9349
    Epoch 5/10
    – 25s – loss: 0.1501 – accuracy: 0.9453
    Epoch 6/10
    – 28s – loss: 0.1257 – accuracy: 0.9540
    Epoch 7/10
    – 26s – loss: 0.1067 – accuracy: 0.9608
    Epoch 8/10
    – 26s – loss: 0.0908 – accuracy: 0.9667
    Epoch 9/10
    – 26s – loss: 0.0754 – accuracy: 0.9726
    Epoch 10/10
    – 27s – loss: 0.0660 – accuracy: 0.9765
    0.3630220188647509 0.9085000157356262

    Also I adecuate the code to reuse the model and predict the class

    def run_example():
    # load the image
    img = load_image(‘sample_image.png’)
    # load model
    model = load_model(‘saul_modelh5’)
    # predict the class
    result = model.predict_classes(img)
    print(result[0])

  38. Avatar
    Saul Aluicira April 24, 2020 at 11:48 am #

    Lesson 6.- Image augmentation

    I included zoom range and shear_range

    datagen = ImageDataGenerator( =0.15,zoom_range=0.9, horizontal_flip=True, vertical_flip=True, rotation_range=30)
    #datagen = ImageDataGenerator()

  39. Avatar
    D Vaishnavi May 1, 2020 at 3:22 am #

    Covid-19 detection: https://arxiv.org/ftp/arxiv/papers/2003/2003.10849.pdf

    Pulmonary Image Classification: https://ieeexplore.ieee.org/abstract/document/8861312

    Smart Traffic Management: https://ieeexplore.ieee.org/document/8666539

    Image forgery recognition: https://iopscience.iop.org/article/10.1088/1742-6596/1368/3/032028

    Food and drink assessment using image recognizing. https://www.mdpi.com/2072-6643/9/7/657

  40. Avatar
    e2e4 May 4, 2020 at 8:49 pm #

    Lesson 1
    In satellite imaging:
    Ship recognition with deep learning technique
    https://appsilon.com/ship-recognition-in-satellite-imagery-part-i/

    Vegetation management
    https://www.20tree.ai

    Forestry control
    https://www.efi.int/sites/default/files/files/events/2018/innovation_workshop3-Liu.pdf

    In medicine
    Skin checks for cancer
    https://www.skinvision.com

    In urban planning and smart cities:
    Deep learning for building occupancy estimation using environmental sensors
    Chen, Z, Jiang, C, Masood, MK, Soh, YC, Wu, M & Li, X 2020, Deep learning for building occupancy estimation using environmental sensors. in W Pedrycz & S-M Chen (eds), Deep learning: algorithms and applications. Studies in Computational Intelligence, vol. 865, pp. 335-357. https://doi.org/10.1007/978-3-030-31760-7_11

  41. Avatar
    e2e4 May 5, 2020 at 9:07 pm #

    Lesson 02
    Data Type: uint8
    Min: 0.000, Max: 255.000
    Min: 0.000, Max: 1.000

  42. Avatar
    e2e4 May 7, 2020 at 7:33 am #

    Lesson 03
    Model: “sequential_1”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_1 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    conv2d_2 (Conv2D) (None, 252, 252, 32) 9248
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 126, 126, 32) 0
    _________________________________________________________________
    max_pooling2d_2 (MaxPooling2 (None, 63, 63, 32) 0
    _________________________________________________________________
    flatten_1 (Flatten) (None, 127008) 0
    _________________________________________________________________
    dense_1 (Dense) (None, 1) 127009
    =================================================================
    Total params: 136,577
    Trainable params: 136,577
    Non-trainable params: 0

    But i need to go deeper into understanding of the process

  43. Avatar
    e2e4 May 7, 2020 at 11:47 pm #

    Lesson 04

    Doberman (33.59%)

    English_foxhound (69.26%)

    German_shepherd (99.56%)

    black-and-tan_coonhound (54.60%)

    99.56 for German shepherd is impressive.

    For picture of a horse the result was
    sorrel (100.00%)
    which is a plant. How would you comment it?

  44. Avatar
    e2e4 May 9, 2020 at 1:05 am #

    Epoch 1/10
    – 57s – loss: 0.3798 – accuracy: 0.8645
    Epoch 2/10
    – 56s – loss: 0.2550 – accuracy: 0.9067
    Epoch 3/10
    – 56s – loss: 0.2077 – accuracy: 0.9227
    Epoch 4/10
    – 56s – loss: 0.1761 – accuracy: 0.9343
    Epoch 5/10
    – 56s – loss: 0.1467 – accuracy: 0.9466
    Epoch 6/10
    – 56s – loss: 0.1252 – accuracy: 0.9535
    Epoch 7/10
    – 56s – loss: 0.1049 – accuracy: 0.9617
    Epoch 8/10
    – 56s – loss: 0.0898 – accuracy: 0.9670
    Epoch 9/10
    – 55s – loss: 0.0745 – accuracy: 0.9722
    Epoch 10/10
    – 55s – loss: 0.0641 – accuracy: 0.9765
    0.3429171951398253 0.9118000268936157

  45. Avatar
    e2e4 May 9, 2020 at 1:44 am #

    Second run resulted in
    0.32463194568455217 0.9093999862670898
    3d run
    0.312660645493865 0.9160000085830688

    Do stohastic processes results depend on the particular hardware?

    • Avatar
      Jason Brownlee May 9, 2020 at 6:18 am #

      Well done!

      Yes and no – but at the numerical methods level.

      Yes, as in the implementations vary across machines because of differences in underlying libraries and eventually hardware. No as we are running the same general operations and minor rounding differences don’t matter much when averaged out.

  46. Avatar
    e2e4 May 10, 2020 at 9:02 pm #

    After saving the trained model and reload it:
    Epoch 1/10
    – 56s – loss: 0.0732 – accuracy: 0.9729
    Epoch 2/10
    – 55s – loss: 0.0624 – accuracy: 0.9773
    Epoch 3/10
    – 55s – loss: 0.0565 – accuracy: 0.9792
    Epoch 4/10
    – 56s – loss: 0.0485 – accuracy: 0.9816
    Epoch 5/10
    – 55s – loss: 0.0435 – accuracy: 0.9844
    Epoch 6/10
    – 55s – loss: 0.0386 – accuracy: 0.9861
    Epoch 7/10
    – 56s – loss: 0.0357 – accuracy: 0.9877
    Epoch 8/10
    – 57s – loss: 0.0321 – accuracy: 0.9888
    Epoch 9/10
    – 57s – loss: 0.0307 – accuracy: 0.9892
    Epoch 10/10
    – 56s – loss: 0.0269 – accuracy: 0.9903
    0.5345134609982372 0.9111999869346619

    Why didn’t it improve on test data?

    • Avatar
      Jason Brownlee May 11, 2020 at 5:58 am #

      Well done.

      What do you mean exactly?

      • Avatar
        e2e4 May 14, 2020 at 2:08 am #

        1) I ran the exercise and received
        0.3429171951398253 0.9118000268936157
        2) Then I modified the code, ran it again and saved the trained model
        3) Modified the code again – reload model and run 10 times again

        Epoch 1/10
        – 56s – loss: 0.0732 – accuracy: 0.9729

        Epoch 10/10
        – 56s – loss: 0.0269 – accuracy: 0.9903

        this is theresult achieved on train data.

        My question is
        why after evaluation of 2 times trained model on test data the loss and accuracy are the same as after first run? I would expect higher acc and lower loss.

        Thank you, Jason!

        • Avatar
          Jason Brownlee May 14, 2020 at 5:55 am #

          Well done.

          I don’t understand your question, can you please rephrase it or elaborate?

  47. Avatar
    e2e4 May 10, 2020 at 9:05 pm #

    Lesson 06
    I varied flip and rotation angle. Also included zoom_range and brightness_range
    #datagen = ImageDataGenerator(brightness_range=[0.2,1.0])
    datagen = ImageDataGenerator(zoom_range=[0.9,1.9])

  48. Avatar
    e2e4 May 17, 2020 at 6:59 pm #

    Lesson 07
    modified it for multiple facesas follows:

    # get the context for drawing boxes
    ax = pyplot.gca()
    i=0
    for i in range(len(faces)):
    # get coordinates from the i face
    x, y, width, height = faces[i][‘box’]
    # create the shape
    rect = Rectangle((x, y), width, height, fill=False, color=’red’)
    # draw the box
    ax.add_patch(rect)
    i+=1
    # show the plot
    pyplot.show()

    Thanks a lot for the course!! It’s very motivating to get results under your guidance, Jason!

  49. Avatar
    vkr May 18, 2020 at 2:45 pm #

    day4: Image classification

    Default result:
    Doberman (33.59%)

    following are different results with different images given
    Samoyed (98.46%) —when an image of a dog is given

    cocker_spaniel (25.23%)—set of 9 different dogs

    Yorkshire_terrier (10.21%)–2 different dogs

  50. Avatar
    Ronke Babatunde May 23, 2020 at 6:41 am #

    • Five impressive applications of deep learning methods in the field of computer vision

    1. Image Classification
    Classification is the process of predicting a specific class, or label, for something that is defined by a set of data points. Machine learning systems build predictive models that have enormous, yet often unseen benefits for people.

    2. Object Detection
    Object Detection is image classification with localization, but in pictures that may contain multiple objects. This is an active and important area of research because the computer vision systems that will be used in robotics and self-driving vehicles will be subjected to very complex images. Locating and identifying every object will undoubtedly be a critical part of their autonomy.

    3. Image Reconstruction
    Image Reconstruction is the task of recreating the missing or corrupt parts of an image.

    4. Object Tracking
    Object Tracking is one such example, where the goal is to keep track of a specific object in a sequence of images, or a video. Object tracking is important for virtually every computer vision system that contains multiple images. In self-driving cars, for example, pedestrians and other vehicles generally have to be avoided at a very high priority. Tracking objects as they move will not only help to avoid collisions through the use of split-second maneuvers, but also, the model can supply relevant information to other systems that will attempt to predict their next move.

    5. Facial Recognition
    Facial recognition is a common feature in today’s smartphones and cameras. Modern facial recognition systems at large enterprises are powered by deep learning networks and algorithms. Facebook’s DeepFace identifies human faces in digital images using a nine-layer neural network. The system has 97 percent accuracy, which is famously better than the FBI’s facial recognition system. Google also developed its own highly accurate facial recognition system named FaceNet.

    An example application can be found in the article titled “Deep Learning for Computer Vision: A Brief Review”. https://doi.org/10.1155/2018/7068349

    • Avatar
      Jason Brownlee May 23, 2020 at 6:42 am #

      Well done!

      • Avatar
        Claudio Lombardi September 25, 2020 at 11:58 am #

        1. Human Pose Estimation
        The following are some of the applications of Human Pose Estimation

        Activity recognition for real-time sports analysis or surveillance system.
        For Augmented reality experiences
        In training Robots
        Animation and gaming

        2. Image Transformation Using GANs:
        When it’s about discussing the applications of Images generated using Gans, we have many. The following are some of its applications

        Image to image translation in style transfer and photo inpainting
        Image super-resolution
        Text to image generation
        Image editing
        Semantic image to photo translation

        3. Computer Vision for Developing Social Distancing Tools
        Computer vision technology can play a vital role in this crucial scenario. It can be used to track people in a premise or a particular area to know whether they are following social distancing norms or not.

        4. Creating a 3D Model From 2D Images
        Now you must be thinking about the use cases of this technology. The following are its applications

        Animation and Gaming
        Robotics
        Self-driving cars
        Medical Diagnosis and surgical operations

        5. Computer Vision in Healthcare: Medical Image Analysis
        Recent developments in computer vision technologies allow doctors to understand them better by converting into 3d interactive models and make their interpretation easy.

        Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation, by Xin Wang, Qiuyuan Huang, Asli Celikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, Lei Zhang

  51. Avatar
    Ronke Babatunde May 24, 2020 at 7:58 pm #

    The code in lesson 2 has been run and the maximum and minimum pixel value of the blonde image before normalization is 255 and 0 respectively, while after normalization is 1 and 0. I was able to display the image in my python environment as well

  52. Avatar
    Ronke Babatunde May 24, 2020 at 9:54 pm #

    Total params: 516,449
    Trainable params: 516,449
    Non-trainable params: 0
    The shape of the image has changed from 256, 256 to 127, 127 as output from the pooling layer
    I varied using one conv layer with 64 filters and maxpooling value 1. I got the output below
    Total params: 4,129,665
    Trainable params: 4,129,665
    Non-trainable params: 0

    I varied using one conv layer with 64 filters and maxpooling value 2, and image size 512 x512, I got the output below
    Total params: 16,267,457
    Trainable params: 16,267,457
    Non-trainable params: 0

    However, i need more explanation on the interpretation of the results please.

  53. Avatar
    Mayank goyal May 26, 2020 at 2:20 pm #

    Model: “sequential_1”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_1 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 127, 127, 32) 0
    _________________________________________________________________
    flatten_1 (Flatten) (None, 516128) 0
    _________________________________________________________________
    dense_1 (Dense) (None, 1) 516129
    =================================================================
    Total params: 516,449
    Trainable params: 516,449
    Non-trainable params: 0

  54. Avatar
    Mayank goyal May 26, 2020 at 2:21 pm #

    The code in lesson 2 has been run and the maximum and minimum pixel value of the blonde image before normalization is 255 and 0 respectively, while after normalization is 1 and 0. I was able to display the image in my python environment as well

  55. Avatar
    Mayank goyal May 26, 2020 at 2:28 pm #

    Model: “sequential_6”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_10 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    conv2d_11 (Conv2D) (None, 252, 252, 64) 18496
    _________________________________________________________________
    max_pooling2d_5 (MaxPooling2 (None, 126, 126, 64) 0
    _________________________________________________________________
    conv2d_12 (Conv2D) (None, 124, 124, 128) 73856
    _________________________________________________________________
    max_pooling2d_6 (MaxPooling2 (None, 62, 62, 128) 0
    _________________________________________________________________
    flatten_4 (Flatten) (None, 492032) 0
    _________________________________________________________________
    dense_4 (Dense) (None, 1) 492033
    =================================================================
    Total params: 584,705
    Trainable params: 584,705
    Non-trainable params: 0
    _________________________________________________________________

  56. Avatar
    Mayank goyal May 26, 2020 at 2:59 pm #

    Lesson: 4

    Doberman (33.59%)
    Egyptian_cat (32.42%)
    Great_Dane (47.91%)

  57. Avatar
    Mayank goyal May 27, 2020 at 3:23 pm #

    lesson 6:

    datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True)

  58. Avatar
    Ronke Babatunde May 28, 2020 at 8:46 am #

    Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5
    553467904/553467096 [==============================] – 1401s 3us/step

    Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json
    40960/35363 [==================================] – 1s 22us/step
    Doberman (33.59%)

  59. Avatar
    Ronke Babatunde May 29, 2020 at 11:53 pm #

    I got this – Doberman (33.59%) when i ran the code

    I got this – cowboy_hat (10.05%) when i loaded a human image

  60. Avatar
    Ronke Babatunde May 30, 2020 at 12:23 am #

    Day 5 task: this is the result i got, training the CNN took a little longer time though

    Epoch 1/10
    – 49s – loss: 0.3850 – accuracy: 0.8631
    Epoch 2/10
    – 44s – loss: 0.2564 – accuracy: 0.9057
    Epoch 3/10
    – 43s – loss: 0.2119 – accuracy: 0.9212
    Epoch 4/10
    – 43s – loss: 0.1804 – accuracy: 0.9326
    Epoch 5/10
    – 42s – loss: 0.1546 – accuracy: 0.9432
    Epoch 6/10
    – 42s – loss: 0.1321 – accuracy: 0.9505
    Epoch 7/10
    – 41s – loss: 0.1139 – accuracy: 0.9575
    Epoch 8/10
    – 41s – loss: 0.0991 – accuracy: 0.9635
    Epoch 9/10
    – 41s – loss: 0.0872 – accuracy: 0.9676
    Epoch 10/10
    – 41s – loss: 0.0716 – accuracy: 0.9731
    0.3339003710135818 0.9138000011444092

  61. Avatar
    Ronke Babatunde June 2, 2020 at 2:02 am #

    Day 6 task. This is the result i got

    Can you kindly give further explanation on interpreting the above result, since we are performing data augmentation. Thanks

  62. Avatar
    Ronke Babatunde June 2, 2020 at 2:26 am #

    Day 7 task. After running the above code,

    is all i got, the face detected could not show, nothing was displayed, Kindly guide. Thanks

  63. Avatar
    Swati June 12, 2020 at 10:44 pm #

    Day1 Task:Applications of Deep Learning in the field of computer vision
    1.augmented reality
    2. virtual reality
    3. autonomous vehicle
    4. Navigation System for Visually impaired
    5.Optic Disc from retina images

  64. Avatar
    Priyanshi burad July 15, 2020 at 5:10 am #

    DAY 2 : PREPARING IMAGE DATASET
    Before Normalization
    Min: 0.000, Max: 255.000

    After Normalization
    Min: 0.000, Max: 1.000

  65. Avatar
    Martin July 19, 2020 at 9:48 pm #

    Day 1 – Applications of deep learning methods in the field of computer vision
    1. stores are presently utilizing facial recognition innovation to give a smoother payment experience to customers (at the cost of their security, however). Rather than utilizing credit cards or mobile payment apps, clients just need to demonstrate their face to a computer vision-equipped camera.
    2. iPhone X introduced FaceID, a validation framework that utilizes an on-device neural network to open the telephone when it sees its owner’s face. During setup, FaceID trains its AI model on the face of the owner and works modestly under various lighting conditions, facial hair, hair styles, caps, and glasses.
    3. Diabetic Foot Ulcers (DFU) that affect the lower extremities are a major complication of diabetes. Each year, more than 1 million diabetic patients undergo amputation due to failure to recognize DFU and get the proper treatment from clinicians. There is an urgent need to use a CAD system for the detection of DFU. The paper, proposes using deep learning methods (EfficientDet Architectures) for the detection of DFU- “Goyal, Manu. (2020). A Refined Deep Learning Architecture for Diabetic Foot Ulcers Detection.”
    4. In deep end-to-end learning based autonomous car design, inferencing the signal by trained model is one of the critical issues, particularly, in case of embedded component. Researchers from both academia and industry have been putting their enormous efforts in making this critical autonomous driving more reliable and safer. As research on the real car is costly and poses safety issue, we have developed a small scale, low-cost, deep convolutional neural network powered self-driving car model. Its learning model adopted from NVIDIA’s DAVE-2 which is a real autonomous car and Kansas University’s small scale DeepPicar. Similar to DAVE-2, its neural architecture uses 5 convolution layer and 3 fully connected layers with 250,000 parameters. We have considered Raspberry Pi 3B+ as the processing platform with Quad-core 1.4 GHz CPU based on A53 architecture which is capable to support CNN learning model. – “Goyal, Manu & Yap, Moi Hoon & Hassanpour, Saeed. (2020). Multi-class Semantic Segmentation of Skin Lesions via Fully Convolutional Networks. 290-295. 10.5220/0009380302900295. ”
    5. Image reconstruction which involves filling in missing portions of an image or correcting corrupted parts of an image. Much like image colorization, image reconstruction can be seen as a filter that is applied to the image.

  66. Avatar
    Martin July 20, 2020 at 12:41 am #

    Day 2 : Image Preparation

    —–Before Normalisation—–
    Data Type: uint8
    Min: 0.000, Max: 255.000

    —–After Normalisation—–
    Data Type: uint8
    Min: 0.000, Max: 1.000

  67. Avatar
    Sukanya G July 25, 2020 at 1:59 am #

    Day 2:Image Preparation
    ————————————

    Data Type: uint8

    Min: 0.000, Max: 255.000
    Min: 0.000, Max: 1.000

  68. Avatar
    Sukanya G July 25, 2020 at 2:01 am #

    Day 3:Creation of CNN
    ———————————
    Using TensorFlow backend.
    Model: “sequential_1”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_1 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 127, 127, 32) 0
    _________________________________________________________________
    flatten_1 (Flatten) (None, 516128) 0
    _________________________________________________________________
    dense_1 (Dense) (None, 1) 516129
    =================================================================
    Total params: 516,449
    Trainable params: 516,449
    Non-trainable params: 0

  69. Avatar
    Sukanya G July 25, 2020 at 2:04 am #

    Day 4 Image Classification
    ———————————–
    Doberman (33.59%)

  70. Avatar
    Sukanya G July 25, 2020 at 11:50 pm #

    Day 5 Train image Classification model
    _________________________________
    Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
    32768/29515 [=================================] – 0s 3us/step
    Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
    26427392/26421880 [==============================] – 2s 0us/step
    Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
    8192/5148 [===============================================] – 0s 0us/step
    Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
    4423680/4422102 [==============================] – 1s 0us/step
    Epoch 1/10
    – 35s – loss: 0.3756 – accuracy: 0.8656
    Epoch 2/10
    – 34s – loss: 0.2463 – accuracy: 0.9099
    Epoch 3/10
    – 34s – loss: 0.2030 – accuracy: 0.9254
    Epoch 4/10
    – 34s – loss: 0.1680 – accuracy: 0.9382
    Epoch 5/10
    – 34s – loss: 0.1433 – accuracy: 0.9462
    Epoch 6/10
    – 34s – loss: 0.1201 – accuracy: 0.9549
    Epoch 7/10
    – 34s – loss: 0.0998 – accuracy: 0.9630
    Epoch 8/10
    – 34s – loss: 0.0843 – accuracy: 0.9696
    Epoch 9/10
    – 34s – loss: 0.0685 – accuracy: 0.9744
    Epoch 10/10
    – 34s – loss: 0.0589 – accuracy: 0.9778
    0.3513805921599269 0.9124000072479248

  71. Avatar
    goona faramarzi August 13, 2020 at 5:54 pm #

    hello.thanks for your good explanation. I have two questions.
    first:In rotation =90 in generator it means can rotate image between [-90,90] but I want to ratate exactly90.what should i do?
    second: if we want to rotate 90,360 in generator what should i do?

    • Avatar
      Jason Brownlee August 14, 2020 at 5:59 am #

      Good question, perhaps use a custom generator to control the augmentation.

  72. Avatar
    Yuhua August 18, 2020 at 9:02 pm #

    5 applications of CV for DL: Image Classification, Object Detection, Image Reconstruction, Object Tracking, information retieval

  73. Avatar
    Ryan September 17, 2020 at 2:22 am #

    Lesson 04: Image Classification

    After running this code –> model = VGG16( ), I am getting the following error:

    ResourceExhaustedError: OOM when allocating tensor with shape[3,3,64,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Mul] name: block2_conv1_5/random_uniform/mul/

    How do I fix this?

  74. Avatar
    Jason September 28, 2020 at 12:28 am #

    I tried 7-day to sign-up for this free course. but I enter the email address and hit “Download Now” the link does not seem to be working was it only available for limited amount of time thanks

  75. Avatar
    Moushumi Biswas October 29, 2020 at 5:08 pm #

    Lesson 02

    Data Type: uint8
    Min: 0.000, Max: 255.000
    Min/: 0.000, Max: 1.000

  76. Avatar
    khin san myint November 24, 2020 at 8:02 pm #

    lesson 4 Image Classification

    Doberman (33.59%)

    thank you

  77. Avatar
    khin san myint November 24, 2020 at 8:07 pm #

    Day 3 :

    Default Version
    Model: “sequential”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    conv2d_1 (Conv2D) (None, 252, 252, 32) 9248
    _________________________________________________________________
    conv2d_2 (Conv2D) (None, 250, 250, 32) 9248
    _________________________________________________________________
    max_pooling2d (MaxPooling2D) (None, 125, 125, 32) 0
    _________________________________________________________________
    conv2d_3 (Conv2D) (None, 123, 123, 32) 9248
    _________________________________________________________________
    conv2d_4 (Conv2D) (None, 121, 121, 32) 9248
    _________________________________________________________________
    flatten (Flatten) (None, 468512) 0
    _________________________________________________________________
    dense (Dense) (None, 1) 468513
    =================================================================
    Total params: 505,825
    Trainable params: 505,825
    Non-trainable params: 0

  78. Avatar
    khin san myint November 24, 2020 at 8:27 pm #

    lesson 5

    Default Version
    Epoch 1/10
    1875/1875 – 23s – loss: 0.3870 – accuracy: 0.8618
    Epoch 2/10
    1875/1875 – 27s – loss: 0.2574 – accuracy: 0.9064
    Epoch 3/10
    1875/1875 – 25s – loss: 0.2147 – accuracy: 0.9211
    Epoch 4/10
    1875/1875 – 28s – loss: 0.1842 – accuracy: 0.9325
    Epoch 5/10
    1875/1875 – 28s – loss: 0.1605 – accuracy: 0.9408
    Epoch 6/10
    1875/1875 – 29s – loss: 0.1381 – accuracy: 0.9488
    Epoch 7/10
    1875/1875 – 21s – loss: 0.1194 – accuracy: 0.9572
    Epoch 8/10
    1875/1875 – 28s – loss: 0.1019 – accuracy: 0.9616
    Epoch 9/10
    1875/1875 – 28s – loss: 0.0889 – accuracy: 0.9676
    Epoch 10/10
    1875/1875 – 29s – loss: 0.0773 – accuracy: 0.9717
    0.3053729832172394 0.9150000214576721

  79. Avatar
    khin san myint November 26, 2020 at 6:26 pm #

    Image augmentation

    Testing with additional data

    ImageDataGenerator(horizontal_flip=True, vertical_flip=True, rotation_range=45, fill_mode=’nearest’, rescale=1.5)

    ImageDataGenerator(horizontal_flip=True, vertical_flip=True, rotation_range=45, fill_mode=’nearest’, rescale=0.5)

  80. Avatar
    khin san myint November 27, 2020 at 3:17 pm #

    i have finished face detection.
    i have tested with many faces image.
    in this, only one person is detected. What about?

  81. Avatar
    Azerul Azlan December 5, 2020 at 4:34 pm #

    Lesson1
    5 Impressive application deep learning method:
    1) Self-driving : Companies building these types of driver-assistance services, as well as full-blown self-driving cars like Google’s, need to teach a computer how to take over key parts (or all) of driving using digital sensor systems instead of a human’s senses. To do that companies generally start out by training algorithms using a large amount of data.

    2) Voice Search & Voice-Activated Assistants: One of the most popular usage areas of deep learning is voice search & voice-activated intelligent assistants. With the big tech giants have already made significant investments in this area, voice-activated assistants can be found on nearly every smartphone. Apple’s Siri is on the market since October 2011. Google Now, the voice-activated assistant for Android, was launched less than a year after Siri. The newest of the voice-activated intelligent assistants is Microsoft Cortana.

    3) Automatic Machine Translation: Automatic machine translation has been around for a long time, but deep learning is achieving top results in two specific areas:
    -Automatic Translation of Text
    -Automatic Translation of Images
    Text translation can be performed without any pre-processing of the sequence, allowing the algorithm to learn the dependencies between words and their mapping to a new language.

    4) Image Recognition: It aims to recognize and identify people and objects in images as well as to understand the content and context. Image recognition is already being used in several sectors like gaming, social media, retail, tourism, etc.
    This task requires the classification of objects within a photograph as one of a set of previously known objects. A more complex variation of this task called object detection involves specifically identifying one or more objects within the scene of the photograph and drawing a box around them.

    5) Automatic Image Caption Generation: Automatic image captioning is the task where given an image the system must generate a caption that describes the contents of the image.

  82. Avatar
    Azerul Azlan December 8, 2020 at 12:17 am #

    Lesson 2: Preparing Image Data

    The result for the image given
    Data Type: uint8
    Min: 0.000, Max: 255.000
    Min: 0.000, Max: 1.000

  83. Avatar
    Gautam Pradhan December 8, 2020 at 1:58 am #

    Lesson 1.

    1) Corn plant counting using deep learning and UAV images. DOI: 0.1109/LGRS.2019.2930549
    2) Deep Learning to count coconut plants.
    3) To count number of semi, car, minibus passing through an intersection.
    4) Detecting forehead temperature of moving people through airport checking.
    5) Detection and analysis of wheat spikes using convolutional neural networks. DOI
    https://doi.org/10.1186/s13007-018-0366-8

  84. Avatar
    Gautam December 8, 2020 at 2:17 am #

    Min and Max Values:

    Before Normalization: Min: 0.000, Max: 255.000

    After Normalization: Min: 0.000, Max: 1.000

  85. Avatar
    Azerul Azlan December 9, 2020 at 2:56 pm #

    Lesson 3: CNN
    Model: “sequential”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    max_pooling2d (MaxPooling2D) (None, 127, 127, 32) 0
    _________________________________________________________________
    flatten (Flatten) (None, 516128) 0
    _________________________________________________________________
    dense (Dense) (None, 1) 516129
    =================================================================
    Total params: 516,449
    Trainable params: 516,449
    Non-trainable params: 0
    _________________________________________________________________

  86. Avatar
    Azerul Azlan December 9, 2020 at 3:06 pm #

    Lesson 4: Image Classifier

    The results is: Doberman (33.59%)

    I try for cat image based on the code given and the results is: tiger_cat (30.62%)

  87. Avatar
    Moussa ABOUBAKAR December 29, 2020 at 12:07 am #

    Lesson 01: Deep Learning and Computer Vision

    List of five applications of deep learning methods in the field of computer vision.

    1- 3D Object Retrieval and Recognition (https://dl.acm.org/doi/pdf/10.1145/3042064?casa_token=dYS5kQ5Q4gQAAAAA:R-_J0uUVm7oPLZY6kp9nV-8LXcI0gkR3HaXVSFzrVLl–CBG1_Rdwvs_HGgIuY5FPlXAX7kSaRw)
    2- Plant Phenotyping with Limited Labeled Data (https://arxiv.org/pdf/2006.11391.pdf)

    3- Real time object detection (https://paperswithcode.com/paper/faster-r-cnn-towards-real-time-object)

    4- Image reconstruction (https://machinelearningmastery.com/applications-of-deep-learning-for-computer-vision/)

    5- Pedestrian detection (https://www.sciencedirect.com/science/article/pii/S092523121830290X?casa_token=X00I0OMZ898AAAAA:xPMD-oIEIJdO4lwkkaoGzobjLjH73x9KZd8D498ej-x9oNSfMZ8Qaqp8djBFmIVqyu3soqKf#sec0012)

  88. Avatar
    Moussa ABOUBAKAR December 29, 2020 at 3:32 am #

    Lesson 04: Image Classification

    I got this result after running the example: Doberman (33.59%)

    I tried the example with an image of a car and I get this result: minibus (14.77%)

  89. Avatar
    Mitchell December 31, 2020 at 2:51 am #

    Day4:

    run VGG16 model, it shows the dog is a doberman.

    lable [[(‘n02107142’, ‘Doberman’, 0.3359479), (‘n02105412’, ‘kelpie’, 0.21615942), (‘n02106550’, ‘Rottweiler’, 0.1769872), (‘n02089078’, ‘black-and-tan_coonhound’, 0.12776804), (‘n02107312’, ‘miniature_pinscher’, 0.03730356)]]
    Doberman (33.59%)

    When I changed to VGG19, the classification predicts it is a Kelpie

    lable [[(‘n02105412’, ‘kelpie’, 0.35011458), (‘n02107142’, ‘Doberman’, 0.2983739), (‘n02106550’, ‘Rottweiler’, 0.22378054), (‘n02089078’, ‘black-and-tan_coonhound’, 0.04829501), (‘n02099712’, ‘Labrador_retriever’, 0.008893081)]]

    kelpie (35.01%)

    In both models, Kelpie and Doberman tops the probability on the lists. Are they using the same training data? Why is the output so different?

    • Avatar
      Jason Brownlee December 31, 2020 at 5:30 am #

      Nice work!

      Specifically, no idea. Generally, different models have different capabilities.

  90. Avatar
    Mitchell January 1, 2021 at 5:24 am #

    Thank you for the class!

    Day7. I modified the code to support face detection on multiple faces. The change is to add a ‘for loop’ for each face.

    Suppose you save the code as mtCNN.py, you can run

    Command line> python mtcnn.py picture.jpg

    There should be 11 faces in this picture

    https://www.google.com/search?q=people+on+street&sxsrf=ALeKk00zWelGEzoVT7Mtt3GORoIZcLke-w:1609437597048&tbm=isch&source=iu&ictx=1&fir=oD9-FR9LLlTuNM%252CM3i4Lf8ga0sePM%252C_&vet=1&usg=AI4_-kQlCK-9MOCohapSoZ_XMOAogDvHbQ&sa=X&ved=2ahUKEwj184mi5vjtAhURWqwKHTuRCEQQ9QF6BAgFEAE&biw=1212&bih=569#imgrc=oD9-FR9LLlTuNM

  91. Avatar
    Tarun January 4, 2021 at 5:45 pm #

    Lesson 01: Five impressive computer vision apps from my perspective is :
    1. Detecting disease in human, plants, animals etc.
    2. Face detection and recognition
    3. Face Landmarks detection
    4. Pose detection
    5. Wardrobe selection using AI/AR/VR
    6. Face Mask detection
    7. Social distancing calculation
    …and many more.

  92. Avatar
    Viraj Mehta January 6, 2021 at 9:46 pm #

    5 applications of deep learning in the field of computer vision are as follows:

    1) Image Classification
    2) Object Detection
    3) Image Reconstruction
    4) Image Classification with Localization
    5) Style Transfer

  93. Avatar
    Viraj Mehta January 7, 2021 at 5:10 pm #

    Findings of task 2:

    Min : 0.000
    Max: 1.000

  94. Avatar
    Samira January 8, 2021 at 8:06 pm #

    Lesson 01: Paper for impressive applications of deep learning methods for Computer vision

    https://www.hindawi.com/journals/cin/2018/7068349/

  95. Avatar
    Sudarshan January 19, 2021 at 3:42 pm #

    lesson 1:
    1.Fruit Quality Evaluation using Machine Learning
    2.Faulty PCB detector using Machine Learning

  96. Avatar
    Deepa January 21, 2021 at 4:42 am #

    Agriculture -Potato classification
    Remote sensing – Soil classification
    Paint Quality- Car assembly ,detection of paint issues
    Natural disaster recovery – flood risk assessment
    Sports, cricket – Umpire decision review system

  97. Avatar
    Deepa January 21, 2021 at 5:20 pm #

    Lesson2: Preparing Image data
    Before Normalization
    min – 0.000 , max – 255.000
    After Normalization
    min-0.000 max -1.000

  98. Avatar
    Deepa January 23, 2021 at 2:58 pm #

    Lesson3:Convolutional Neural Network

    used tensorflow.keras to import method
    Model: “sequential_1”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    max_pooling2d (MaxPooling2D) (None, 127, 127, 32) 0
    _________________________________________________________________
    flatten (Flatten) (None, 516128) 0
    _________________________________________________________________
    dense (Dense) (None, 1) 516129
    =================================================================
    Total params: 516,449
    Trainable params: 516,449
    Non-trainable params: 0

  99. Avatar
    Sudarshan Bhagwan Bhalkar January 25, 2021 at 3:39 pm #

    Lesson4:

    output — Doberman (33.59%)

  100. Avatar
    sudarshan January 25, 2021 at 4:24 pm #

    I am done with all tasks ,I really enjoyed this course.
    thanks Jason Brownlee for this amazing experience.

  101. Avatar
    Karthi January 26, 2021 at 12:21 am #

    %s uint8
    819840
    Before Normalization
    Min :0.000 Max:255.000 Mean :155.53 Standard Dewviation 51.79
    After Normalization
    Min :0.000 Max:1.000 Mean :0.61 Standard Dewviation 0.20

  102. Avatar
    Deepa January 28, 2021 at 4:28 am #

    Lesson 4 : Image Classification

    Due to proxy issue , have downloaded and saved the imagenet_class_index.json in local

    wanted to use this for decode_predictions

    How to give this local json file as input in decode _predictions ?

    • Avatar
      Jason Brownlee January 28, 2021 at 6:08 am #

      Sorry, I don’t know about the json file.

      • Avatar
        Deepa January 29, 2021 at 1:00 am #

        It is fine. I downloaded the json file and placed in folder ~/keras/model and decode_predictions worked

  103. Avatar
    Deepa January 29, 2021 at 12:58 am #

    Lesson 4: Image Classification
    Doberman (33.59%)

    • Avatar
      Deepa January 29, 2021 at 1:02 am #

      Please guide to interpret the output of image classification

      • Avatar
        Jason Brownlee January 29, 2021 at 6:06 am #

        What do you mean exactly? Can you please elaborate?

    • Avatar
      Jason Brownlee January 29, 2021 at 6:06 am #

      Well done.

  104. Avatar
    Deepa February 1, 2021 at 11:34 pm #

    Lesson 5: Train Image Classification Model
    Train on 60000 samples
    Epoch 1/10
    60000/60000 – 68s – loss: 0.3970 – accuracy: 0.8601
    Epoch 2/10
    60000/60000 – 73s – loss: 0.2660 – accuracy: 0.9028
    Epoch 3/10
    60000/60000 – 71s – loss: 0.2177 – accuracy: 0.9199
    Epoch 4/10
    60000/60000 – 70s – loss: 0.1869 – accuracy: 0.9301
    Epoch 5/10
    60000/60000 – 80s – loss: 0.1605 – accuracy: 0.9406
    Epoch 6/10
    60000/60000 – 80s – loss: 0.1376 – accuracy: 0.9493
    Epoch 7/10
    60000/60000 – 108s – loss: 0.1179 – accuracy: 0.9562
    Epoch 8/10
    60000/60000 – 88s – loss: 0.0995 – accuracy: 0.9631
    Epoch 9/10
    60000/60000 – 89s – loss: 0.0855 – accuracy: 0.9688
    Epoch 10/10
    60000/60000 – 72s – loss: 0.0726 – accuracy: 0.9737

    LOSS ACCURACY
    0.33886396311819555 0.9118

    Challenge & Learning
    ——————————–
    To download data from googleapi was restricted due to environment settings

    It was fixed by inclusion of the following lines in the file ‘__init__.py’ before loading data

    You can find this file in the folder ~keras/Datasets/fashion_mnist/

    XXXX – depends upon the user environment

    import os
    os.environ[‘NO_PROXY’] = ‘http://XXXX’
    os.environ[‘PROXY’] = ‘http://XXXX’
    os.environ[‘HTTPS_PROXY’] = ‘http://XXXX’
    os.environ[‘ALL_PROXY’] = ‘http://XXXX’

  105. Avatar
    deepa February 6, 2021 at 7:27 pm #

    Lesson 6: Image augmentation

    Observation- object in the image remains the same.
    original image is rotated /shifted to different directions. Rotation is by 90 degree
    It shows that image was captured by the photographer in different angles

  106. Avatar
    Kuldeep March 9, 2021 at 1:45 am #

    Some applications of deep learning methods in the field of computer vision:
    1. Image classification
    2. Facial recognition applications
    3. Item and logistic classification
    4. Computer Vision in Healthcare: Medical Image Analysis
    5. Creating a 3D Model From 2D Images
    6. Computer Vision for Developing Social Distancing Tools

  107. Avatar
    Kuldeep March 9, 2021 at 2:10 am #

    Day 2: Preparing Image Data:
    Result:

    (base) C:\Users\226399\Kerasprojects>python imagedata.py
    Data Type: uint8
    Pixel range before Normalization
    Min: 0.000, Max: 255.000
    Pixel range after Normalization
    Min: 0.000, Max: 1.000

  108. Avatar
    Kuldeep March 9, 2021 at 3:17 am #

    Day 2: Preparing Image Data:
    updated the example to standardize the pixel values. Result as follows

    (base) C:\Users\226399\Kerasprojects>python imagedata.py
    Data Type: uint8
    Pixel range before Normalization
    Min: 0.000, Max: 255.000
    Pixel range after Normalization
    Min: 0.000, Max: 1.000
    Pixel mean is 155.55 and Pixel Std dev is 51.437077
    Standardized Pixel mean is 0.061361298 and Standardized Pixel Std dev is 1.0245248

    I did not get mean=0 and std dev =1 after standardization.

    $$$$$$$$$$$$$$$ My code is as follows: $$$$$$$$$$$$$$$$$$$

    # example of pixel normalization
    from numpy import asarray
    from PIL import Image
    # load image
    image = Image.open(‘bondi_beach.jpg’)
    pixels = asarray(image)
    # confirm pixel range is 0-255
    print(‘Data Type: %s’ % pixels.dtype)
    print(“Pixel range before Normalization”);
    print(‘Min: %.3f, Max: %.3f’ % (pixels.min(), pixels.max()))
    # convert from integers to floats
    pixels = pixels.astype(‘float32’)
    # normalize to the range 0-1
    npixels = pixels /255.0
    # confirm the normalization
    print(“Pixel range after Normalization”);
    print(‘Min: %.3f, Max: %.3f’ % (npixels.min(), npixels.max()))
    #Standardize the pixels
    #calculate dataset mean and std
    pixel_mean=pixels.mean()
    pixel_std=pixels.std()
    print(“Pixel mean is %s and Pixel Stdev is %s”%(pixel_mean,pixel_std))
    #calculate Z score
    for i in pixels:
    spixel=(i-pixel_mean)/pixel_std
    print(“Standardized Pixel mean is %s and Standardized Pixel Std dev is %s” %(spixel.mean(),spixel.std()))
    $$$$$$$$$$$$$$$$$$$$$$$$ CODE ENDS $$$$$$$$$$$$$$$$$$$$$$$$$$

  109. Avatar
    dingowhiz March 12, 2021 at 7:44 pm #

    Lesson 4: Image Classification
    I had a kelpie with goggle and it gave me ‘llama (85.72%)’

  110. Avatar
    dingowhiz March 12, 2021 at 7:49 pm #

    Day 5: Train Image Classification Model
    >print(loss, acc)

    0.3586573600769043 0.907800018787384

  111. Avatar
    Nafy Aidara March 23, 2021 at 7:10 am #

    Five applications of deep learning in Computer vision:
    1. Image classification/ recognition
    2. Object Detection
    3. Image reconstruction
    4. Object Segmentation
    5. Image colorization
    One research paper that illustrate this is: Deep residual learning for image recognition written by Kaiming He, Xiangyu Zhang, shaoping Ren and Jian Sun.

  112. Avatar
    Nafy Aidara March 23, 2021 at 8:30 am #

    Lesson 2: Preparing image data
    1. The first thing i did is to display the properties of the loaded image. and I obtain the following
    JPEG
    RGB
    (640, 427)
    2.confirming the image pixels
    Data Type: uint8
    Min: 0.000, Max: 255.000
    3. Normalize the data
    Data Type: uint8
    Min: 0.000, Max: 255.000
    Min: 0.000, Max: 1.000
    4. Global Standardize
    Mean: 155.544, Standard Deviation: 51.411
    Mean: 0.539, Standard Deviation: 0.377
    Min: 0.000, Max: 1.000
    5. Global Centering
    Mean: 155.544
    Min: 0.000, Max: 255.000
    Mean: -0.000
    Min: -155.544, Max: 99.456

  113. Avatar
    KC PARK March 25, 2021 at 3:40 pm #

    Lesson 06: Image Augmentation

    Image data augmentation is a useful technique.
    I made the program to show the result on the browser using streamlit framework.

    The streamlit is in the link below.
    https://streamlit.io/

    ========================================================================

  114. Avatar
    Nafy Aidara March 30, 2021 at 10:28 am #

    Lesson 7: Face Detection
    After running the code obtained the picture with a red rectangle on the face
    I also run another picture and the faces are detected
    Thanks

  115. Avatar
    Rizu Hoshin April 21, 2021 at 12:22 pm #

    Data Type: uint8

    Before
    Min: 0.000, Max: 255.000

    After
    Min: 0.000, Max: 1.000

  116. Avatar
    Rizu Hoshin April 21, 2021 at 2:48 pm #

    lesson 03 – CNN

    Model: “sequential”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    max_pooling2d (MaxPooling2D) (None, 127, 127, 32) 0
    _________________________________________________________________
    flatten (Flatten) (None, 516128) 0
    _________________________________________________________________
    dense (Dense) (None, 1) 516129
    =================================================================
    Total params: 516,449
    Trainable params: 516,449
    Non-trainable params: 0
    _________________________________________________________________

    can you give me tips to build CNN architecture? i’m confuse because i don’t know the right combination for filtering and pooling the image.

  117. Avatar
    Konstantin May 31, 2021 at 6:44 pm #

    1-5. Medicine application (a lot of research papers 🙂 )
    Classification, segmentation, diagnostic, 3D analysis, image restoration etc

  118. Avatar
    Konstantin June 2, 2021 at 7:02 am #

    At last :)))))

    Model: “sequential”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    max_pooling2d (MaxPooling2D) (None, 127, 127, 32) 0
    _________________________________________________________________
    flatten (Flatten) (None, 516128) 0
    _________________________________________________________________
    dense (Dense) (None, 1) 516129
    =================================================================
    Total params: 516,449
    Trainable params: 516,449
    Non-trainable params: 0
    _________________________________________________________________

    Process finished with exit code 0

  119. Avatar
    Keerthesh Reddy June 3, 2021 at 8:01 pm #

    Lesson 1: Applications of DL in CV

    Object detection
    Object localization
    Object segmentation
    Pose Estimation
    Object/Body measurements

  120. Avatar
    Keerthesh Reddy June 4, 2021 at 12:44 am #

    Lesson 2: Pixel Normalization and standardization

    Data Type: uint8
    Min: 0.000, Max: 255.000
    Min: 0.000000, Max:1.000000
    standardization
    Mean: 155.544, Standard Deviation: 51.411
    Mean: -0.000, Standard Deviation: 1.000

  121. Avatar
    Keerthesh Reddy June 5, 2021 at 12:41 am #

    Lesson 3: Convolutional Neural Network

    Model: “sequential_7”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d_7 (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    max_pooling2d_7 (MaxPooling2 (None, 127, 127, 32) 0
    _________________________________________________________________
    flatten_7 (Flatten) (None, 516128) 0
    _________________________________________________________________
    dense_7 (Dense) (None, 1) 516129
    =================================================================
    Total params: 516,449
    Trainable params: 516,449
    Non-trainable params: 0
    _________________________________________________________________

  122. Avatar
    Konstantin June 5, 2021 at 1:07 am #

    Nice birds! Thank you ))))))))))))))))
    I can not send you the photos

  123. Avatar
    Konstantin June 8, 2021 at 6:22 am #

    from matplotlib import pyplot
    from matplotlib.patches import Rectangle
    from mtcnn.mtcnn import MTCNN
    # load image from file
    pixels = pyplot.imread(‘two.jpg’)
    # create the detector, using default weights
    detector = MTCNN()
    # detect faces in the image
    faces = detector.detect_faces(pixels)
    print(faces)
    # plot the image
    pyplot.imshow(pixels)
    # get the context for drawing boxes
    ax = pyplot.gca()
    # get coordinates from the first face
    x, y, width, height = faces[0][‘box’]
    # create the shape
    rect = Rectangle((x, y), width, height, fill=False, color=’red’)
    # draw the box
    ax.add_patch(rect)
    # get coordinates from the first face
    x, y, width, height = faces[1][‘box’]
    # create the shape
    rect = Rectangle((x, y), width, height, fill=False, color=’blue’)
    # draw the box
    ax.add_patch(rect)

    # show the plot
    pyplot.show()

    Thank you!!!!!!1

  124. Avatar
    Pingpony June 10, 2021 at 4:58 pm #

    task lesson 1

    Automatic Screening of Diabetic Retinopathy Images with Convolution Neural Network Based on Caffe Framework

    https://dl.acm.org/doi/abs/10.1145/3107514.3107523?casa_token=Q6Ulyrz5JVAAAAAA%3AD6FWQWcBsGr7-VPdZzz3X5Lq4HohPII2FdTqtyh5qGyQwFbMc0n7Ukb9njD8iifjHyKSL1_ZH-7bmQ

    Deep Convolution Neural Network for Malignancy Detection and Classification in Microscopic Uterine Cervix Cell Images
    ‘https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7062987/’

    A Full Stage Data Augmentation Method in Deep Convolutional Neural Network for Natural Image Classification
    ‘https://www.hindawi.com/journals/ddns/2020/4706576/’

    A multi-scale recurrent fully convolution neural network for laryngeal leukoplakia segmentation
    ‘https://www.sciencedirect.com/science/article/pii/S1746809420300690?casa_token=E0pQhtiK7cIAAAAA:OxVgfpuxshh3QYiqxWKowka2KfIxW5U0oovOuurlVc3WiT7v2v4dzlTcwilfFiYv4Ba2ctqX1OI’

    A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation
    ‘https://link.springer.com/article/10.1007/s11548-018-01910-0’

  125. Avatar
    Anand June 15, 2021 at 5:25 pm #

    5 interesting applications of Deep Learning for Computer Vision:

    1) Human Pose estimation: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42237.pdf

    2) Social Distancing tools: https://arxiv.org/abs/1703.06870

    3) Digitalizing Images by reading the text and recognizing objects (OCR for texts and CNN for objects ): https://storage.googleapis.com/pub-tools-public-publication-data/pdf/33418.pdf

    4) Computer Vision for autonomous vehicles: https://www.nowpublishers.com/article/Details/CGV-079

    5) Computer Vision for Metrology:
    https://www.ipf.kit.edu/english/1577.php

  126. Avatar
    Sam Arumugam August 25, 2021 at 3:55 pm #

    Day 3: Convolutional Neural Networks
    Creates a convolutional neural network that expects grayscale images with the square size of 256×256 pixels, with one convolutional layer with 32 filters, each with the size of 3×3 pixels, a max-pooling layer, and a binary classification output layer.

    Program -Output result.

    Model: “sequential”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    max_pooling2d (MaxPooling2D) (None, 127, 127, 32) 0
    _________________________________________________________________
    flatten (Flatten) (None, 516128) 0
    _________________________________________________________________
    dense (Dense) (None, 1) 516129
    =================================================================
    Total params: 516,449
    Trainable params: 516,449
    Non-trainable params: 0

    • Adrian Tam
      Adrian Tam August 27, 2021 at 4:56 am #

      Good work!

  127. Avatar
    khushboo August 25, 2021 at 9:44 pm #

    Day-2

    2.3650445e-10
    Min: 0.000, Max: 0.000

  128. Avatar
    Sam Arumugam August 26, 2021 at 6:35 pm #

    Day 4: Image Classification

    Got this similar result on 1st running only. 2nd-time result nil.

    Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
    58892288/58889256 [==============================] – 6s 0us/step
    58900480/58889256 [==============================] – 6s 0us/step

    note did not get persentage.

  129. Avatar
    Sam Arumugam August 31, 2021 at 6:08 pm #

    Day 4: Image Classification

    Doberman (30.99%)

  130. Avatar
    Rik Aulbers October 25, 2021 at 10:48 pm #

    Lesson 01: Deep Learning and Computer Vision
    ================================

    1. Image classification
    2. Object detection
    3. Object segmentation
    4. Image colorization
    5. Image reconstruction

  131. Avatar
    Rik Aulbers October 25, 2021 at 11:14 pm #

    Lesson 02: Preparing Image Data
    ================================

    Before Normalization:
    Data Type: uint8
    Min: 0.000, Max: 255.000

    After Normalization:
    Data Type: float32
    Min: 0.000, Max: 1.000

    • Adrian Tam
      Adrian Tam October 27, 2021 at 2:20 am #

      Good work!

  132. Avatar
    Kathrin Fl October 29, 2021 at 11:14 pm #

    Lesson 3:

    2021-10-29 14:02:42.961078: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘cudart64_110.dll’; dlerror: cudart64_110.dll not found
    2021-10-29 14:02:42.961531: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

    2021-10-29 14:02:46.000958: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘nvcuda.dll’; dlerror: nvcuda.dll not found
    2021-10-29 14:02:46.001416: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
    2021-10-29 14:02:46.008546: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: DESKTOP-EV70MUJ
    2021-10-29 14:02:46.009171: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-EV70MUJ
    2021-10-29 14:02:46.010344: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
    To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

    Model: “sequential”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    max_pooling2d (MaxPooling2D) (None, 127, 127, 32) 0
    _________________________________________________________________
    flatten (Flatten) (None, 516128) 0
    _________________________________________________________________
    dense (Dense) (None, 1) 516129
    =================================================================
    Total params: 516,449
    Trainable params: 516,449
    Non-trainable params: 0
    _________________________________________________________________

    Hi, what do I do with the warnings?

    Thanks!

    • Adrian Tam
      Adrian Tam October 30, 2021 at 12:40 pm #

      Don’t need to care about. It’s just saying your tensorflow is not fully utilizing your computer’s power.

  133. Avatar
    Serenina November 11, 2021 at 1:23 am #

    Lesson 01:

    – Understanding cartoon emotion (DOI: 10.1007/s00521-021-06003-9)

    – Segmentation of plant species and communities (DOI: 10.1038/s41598-019-53797-9)

    – Postnatal gestational age estimation of newborns (DOI: 10.1016/j.imavis.2018.09.003)

    – RootNav 2.0: Navigation of complex plant root architectures (DOI: 10.1093/gigascience/giz123)

    – Transfer of Learning from Vision to Touch (DOI: 10.3390/s21010113)

    • Adrian Tam
      Adrian Tam November 14, 2021 at 12:26 pm #

      That’s a great list!

  134. Avatar
    Jose Luis Ortiz Volcan December 2, 2021 at 4:27 am #

    Lesson 1

    1. Diagnosis of oil wells undergoing artificial lift operations
    2. Identification of Bottlenecks in supply chains of particular industry processes
    3. Early identification of gas or oil leaks in oil and gas fields from analysis of images captured with drones
    4. Land subsidence monitoring and evaluation in areas undergoing mining operations using satellite images
    5. Early risk identification from visual analysis of key parameters vs. time plots in high fluid pressure operations

  135. Avatar
    Jose Luis Ortiz Volcan December 2, 2021 at 4:50 am #

    Lesson 02

    Before normalization
    Data Type: uint8
    Min: 0.000, Max: 255.000

    After normalization
    Data Type: float32
    Min: 0.000, Max: 1.000

    • Adrian Tam
      Adrian Tam December 8, 2021 at 5:48 am #

      Good job, Jose.

  136. Avatar
    Jose Luis Ortiz Volcan December 3, 2021 at 2:49 pm #

    Ref. Deep Learning for Computer Vision Crash Course – Lesson 03

    Model: “sequential”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    conv2d (Conv2D) (None, 254, 254, 32) 320
    _________________________________________________________________
    max_pooling2d (MaxPooling2D) (None, 127, 127, 32) 0
    _________________________________________________________________
    flatten (Flatten) (None, 516128) 0
    _________________________________________________________________
    dense (Dense) (None, 1) 516129
    =================================================================
    Total params: 516,449
    Trainable params: 516,449
    Non-trainable params: 0

  137. Avatar
    Jose Luis Ortiz Volcan December 3, 2021 at 2:50 pm #

    Ref. Deep Learning for Computer Vision Crash Course – Lesson 04

    Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels.h5
    553467904/553467096 [==============================] – 20s 0us/step
    553476096/553467096 [==============================] – 20s 0us/step
    Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json
    40960/35363 [==================================] – 0s 0us/step
    49152/35363 [=========================================] – 0s 0us/step
    Doberman (35.42%)

    • Adrian Tam
      Adrian Tam December 8, 2021 at 6:55 am #

      That’s looks very good, Jose.

  138. Avatar
    Jose Luis Ortiz Volcan December 4, 2021 at 6:37 am #

    Comments:
    1. After message: “cannot import name ‘to_categorical’ from ‘keras.utils'” imported from tensorflow.keras.utils

    2. After running the example the performance of the model on the test database is as follows:
    Running time = 128.82744431495667 seconds
    Test loss: 0.3041397035121918
    Test accuracy: 0.9147999882698059

    3. Varying the configuration of the model by adding another CNN

    model2 = Sequential()
    model2.add(Conv2D(32, (3, 3), activation=’relu’, kernel_initializer=’he_uniform’, input_shape=(28, 28, 1)))
    model2.add(Conv2D(32, (5,5), activation =’relu’))
    model2.add(MaxPooling2D())
    model2.add(Flatten())
    model2.add(Dense(100, activation=’relu’, kernel_initializer=’he_uniform’))
    model2.add(Dense(10, activation=’softmax’))
    model2.compile(optimizer=’adam’, loss=’categorical_crossentropy’, metrics=[‘accuracy’])
    model2.summary()

    Running time = 536.1874532699585 seconds
    Test loss: 0.4339408576488495
    Test accuracy: 0.9174000024795532

    Conclusion: it improved a little bit but at the expense of increasing running time 316%.

  139. Avatar
    Jose Luis Ortiz Volcan December 9, 2021 at 6:12 am #

    Thank you

  140. Avatar
    Jose Luis Ortiz Volcan December 9, 2021 at 6:26 am #

    Ref. Deep Learning for Computer Vision Crash Course – Lesson 06

    After application of the code:

    ImageDataGenerator(horizontal_flip=True, vertical_flip=True, rotation_range=90)

    the result is a set of 9 images randomly selected with horizontal and vertical shifts. Parameter ‘rescale by default is none.

    I played with parameter ‘rescale’

    ImageDataGenerator(horizontal_flip=True, vertical_flip=True, rescale=0.5, rotation_range=90)
    ImageDataGenerator(horizontal_flip=True, vertical_flip=True, rescale=1.5, rotation_range=90)
    ImageDataGenerator(horizontal_flip=True, vertical_flip=True, rescale=3, rotation_range=90)

    With the previous codes I created 3 sets (9 images each), for parameter ‘rescale’ set to 0.5 we get darker background.

    With parameter ‘rescale’ set to 1.5 and 3 we get less darker background and even some noise.

    Image augmentation is a powerful technique for improving the training of a deep neural networks such as CNN.

  141. Avatar
    Jose Luis Ortiz Volcan December 9, 2021 at 12:23 pm #

    The code provided for face detection with mtcnn on the given photograph worked very well as it drawed a box around the face.

    mtcnn also worked very well in a photograph with multiple faces.

    • Adrian Tam
      Adrian Tam December 10, 2021 at 4:18 am #

      Thanks for confirming!

  142. Avatar
    Sulochana February 4, 2022 at 6:36 am #

    Computer Vision Applications
    1. Image Classification: Labelling the image based on what it consists of is classification(eg:cat/dog)
    2. Image Classification With Localization: Identify the location of the object in the frame and create a bounding box around it, It is known as localization.
    3. Object Detection:Detecting object while multiple objects present
    4. Object Tracking: Track of a specific object in a sequence of images, or a video
    5. Object Segmentation:Object Segmentation or Image segmentation is termed as categorizing each pixel value of an image to a particular class.
    6. Style Transfer:Style transfer or neural style transfer is the task of learning style from one or more images and applying that style to a new image.

    • Avatar
      James Carmichael February 4, 2022 at 10:19 am #

      Thank you for the feedback! Keep up the great work!

  143. Avatar
    Sulochana February 4, 2022 at 11:37 pm #

    Lesson-2: Preparing Image Data

    Data Type: uint8
    Min: 0.000, Max: 255.000

    After normalization
    Min: -0.012, Max: 0.008

    Mean: 0.610, Standard Deviation: 0.202
    Mean: 0.000, Standard Deviation: 1.000

  144. Avatar
    sam October 10, 2022 at 8:00 pm #

    hi,

    Why we have to change the data to ‘float32’ before normalization
    =======================
    # convert from integers to floats
    trainX, testX = trainX.astype(‘float32’), testX.astype(‘float32’)
    # normalize to range 0-1
    trainX,testX = trainX / 255.0, testX / 255.0
    ===================================

    I think without changing to float32. the data will change to float64 after normalization. The code works fine without changing to float32. Is it to save memory?

    Thanks

Leave a Reply