Extracting Histogram of Gradients with OpenCV

By Adrian Tam on January 30, 2024 in OpenCV 0

Besides the feature descriptor generated by SIFT, SURF, and ORB, as in the previous post, the Histogram of Oriented Gradients (HOG) is another feature descriptor you can obtain using OpenCV. HOG is a robust feature descriptor widely used in computer vision and image processing for object detection and recognition tasks. It captures the distribution of gradient orientations in an image and provides a powerful representation invariant to changes in illumination and shadowing.

In this post, you will learn about HOG. Specifically, you will know:

What is HOG, and how is it related to an image
How to compute it in OpenCV

Kick-start your project with my book Machine Learning in OpenCV. It provides self-study tutorials with working code.

Let’s get started.

Extracting Histogram of Gradients with OpenCV
Photo by Alexas_Fotos. Some rights reserved.

Overview

This post is divided into two parts; they are:

Understanding HOG
Computing HOG in OpenCV
Using HOg for People Detection

Understanding HOG

The concept behind the HOG algorithm is to compute the distribution of gradient orientations in localized portions of an image. HOG operates on a window, which is a region of fixed pixel size on the image. A window is divided into small spatial regions, known as a block, and a block is further divided into multiple cells. HOG calculates the gradient magnitude and orientation within each cell, and creates a histogram of gradient orientations. Then the histograms within the same block are concatenated.

Gradient measures how a pixel’s color intensity compares to its neighbors. The more drastic it changes, the higher the magnitude. The orientation tells which direction is the steepest gradient. Usually, this is applied on a single-channel image (i.e., grayscale), and each pixel can have its own gradient. HOG gathers all gradients from a block and puts them into a histogram.

The clever way of making a histogram in HOG is that the bins in a histogram are determined by the angle, but the value is interpolated between the closest bins. For example, if the bins are assigned values 0, 20, 40, and so on while the gradient was 10 at angle 30, a value of 5 was added to bins of 20 and 40. This way, HOG can effectively capture the texture and shape of objects within the image.

HOG is particularly effective for detecting objects with distinguishable textures and patterns, making it a popular choice for tasks such as pedestrian detection and other forms of object recognition. With its ability to capture the distribution of gradient orientations, HOG provides a robust representation invariant to variations in lighting conditions and shadows.

Computing HOG in OpenCV

OpenCV provides a straightforward method to compute the HOG descriptor, making it easily accessible for developers and researchers. Let’s take a look at a basic example of how to compute HOG in OpenCV:

import cv2

# Load the image and convert to grayscale
img = cv2.imread('image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# define each block as 4x4 cells of 64x64 pixels each
cell_size = (128, 128)      # h x w in pixels
block_size = (4, 4)         # h x w in cells
win_size = (8, 6)           # h x w in cells

nbins = 9  # number of orientation bins
img_size = img.shape[:2]  # h x w in pixels

# create a HOG object
hog = cv2.HOGDescriptor(
    _winSize=(win_size[1] * cell_size[1],
              win_size[0] * cell_size[0]),
    _blockSize=(block_size[1] * cell_size[1],
                block_size[0] * cell_size[0]),
    _blockStride=(cell_size[1], cell_size[0]),
    _cellSize=(cell_size[1], cell_size[0]),
    _nbins=nbins
)
n_cells = (img_size[0] // cell_size[0], img_size[1] // cell_size[1])

# find features as a 1xN vector, then reshape into spatial hierarchy
hog_feats = hog.compute(img)
hog_feats = hog_feats.reshape(
    n_cells[1] - win_size[1] + 1,
    n_cells[0] - win_size[0] + 1,
    win_size[1] - block_size[1] + 1,
    win_size[0] - block_size[0] + 1,
    block_size[1],
    block_size[0],
    nbins)
print(hog_feats.shape)

import cv2

# Load the image and convert to grayscale

img = cv2.imread('image.jpg')

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# define each block as 4x4 cells of 64x64 pixels each

cell_size = (128, 128) # h x w in pixels

block_size = (4, 4) # h x w in cells

win_size = (8, 6) # h x w in cells

nbins = 9 # number of orientation bins

img_size = img.shape[:2] # h x w in pixels

# create a HOG object

hog = cv2.HOGDescriptor(

_winSize=(win_size[1] * cell_size[1],

win_size[0] * cell_size[0]),

_blockSize=(block_size[1] * cell_size[1],

block_size[0] * cell_size[0]),

_blockStride=(cell_size[1], cell_size[0]),

_cellSize=(cell_size[1], cell_size[0]),

_nbins=nbins

)

n_cells = (img_size[0] // cell_size[0], img_size[1] // cell_size[1])

# find features as a 1xN vector, then reshape into spatial hierarchy

hog_feats = hog.compute(img)

hog_feats = hog_feats.reshape(

n_cells[1] - win_size[1] + 1,

n_cells[0] - win_size[0] + 1,

win_size[1] - block_size[1] + 1,

win_size[0] - block_size[0] + 1,

block_size[1],

block_size[0],

nbins)

print(hog_feats.shape)

HOG computes features for one window at a time. There are multiple blocks in a window. In a block, there are multiple “cells”. See the following illustration:

Assume this entire picture is one window. A window is divided into cells (green grids), and several cells are combined into one block (red and blue boxes). There are many overlapping blocks in one window, but all blocks are the same size.

Each cell is of a fixed size. In the above, you used 64×64 pixels in a cell. Each block has an equal number of cells. In the above, you used 4×4 cells in a block. Also, there is equal number of cells in a window; you used 8×6 cells above. However, we are not dividing an image into blocks or windows when we compute HOG. But instead,

Consider a window as a sliding window on the image, in which the sliding window’s stride size is the size of one cell, i.e., it slides across one cell at a time
We divide the window into cells of fixed size
We set up the second sliding window that matches the block size and scan the window. It slides across one cell at a time
Within a block, HOG is computed from each cell

The returned HOG is a vector for the entire image. In the code above, you reshaped it to make it clear the hierarchy of windows, blocks, cells, and histogram bins. For example, hog_feats[i][j] corresponds to the window (in numpy slicing syntax):

img[n_cells[1]*i : n_cells[1]*i+(n_cells[1]*win_size[1]),
    n_cells[0]*j : n_cells[0]*j+(n_cells[0]*win_size[0])]

1 2	img[n_cells[1]i : n_cells[1]i+(n_cells[1]win_size[1]), n_cells[0]j : n_cells[0]j+(n_cells[0]win_size[0])]

Or, equivalently, the window with the cell (i,j) at the top left corner.

A sliding window is a common technique in object detection because you cannot be sure a particular object lies exactly in a grid cell. Making smaller cells but larger windows is a better way to catch the object than just seeing a part of it. However, there’s a limitation: An object larger than the window will be missed. Also, an object too small may be dwarfed by other elements in the window.

Usually, you have some downstream tasks associated with HOG, such as running an SVM classifier on the HOG features for object detection. In this case, you may want to reshape the HOG output into vectors of the entire block rather than in the hierarchy of each cell like above.

Using HOG for People Detection

The feature extraction technique in the code above is useful if you want to get the raw feature vectors for other purposes. But for some common tasks, OpenCV comes with pre-trained machine learning models for your disposal without much effort.

Let’s consider the photo from the following URL (save it as people.jpg):

https://unsplash.com/photos/people-crossing-on-pedestrian-lane-near-buildings-during-daytime-JfBj_rVOhKY

A photo is used as an example to detect people using HOG.
Photo by Chris Dickens. Some rights reserved.

This is a picture of people crossing a street. OpenCV has a “people detector” in HOG that was trained on a 64×128 pixel window size. Using it to detect people in a photo is surprisingly simple:

import cv2

# Load the image and convert it to grayscale
img = cv2.imread('people.jpg')

hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

# Detect people in the image
locations, confidence = hog.detectMultiScale(img)

# Draw rectangles around the detected people
for (x, y, w, h) in locations:
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 5)

# Display the image with detected people
cv2.imshow('People', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

import cv2

# Load the image and convert it to grayscale

img = cv2.imread('people.jpg')

hog = cv2.HOGDescriptor()

hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

# Detect people in the image

locations, confidence = hog.detectMultiScale(img)

# Draw rectangles around the detected people

for (x, y, w, h) in locations:

cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 5)

# Display the image with detected people

cv2.imshow('People', img)

cv2.waitKey(0)

cv2.destroyAllWindows()

In the above, you created a HOG descriptor with the parameters from cv2.HOGDescriptor_getDefaultPeopleDetector() will initialize an SVM classifier to detect a particular object, which in this case is people.

You call the descriptor on an image and run the SVM in one pipeline using hog.detectMultiScale(img), which returns the bounding boxes for each object detected. While the window size is fixed, this detection function will resize the image in multiple scales to find the best detection result. Even so, the bounding boxes returned are not tight. The code above also annotates the people detected by marking the bounding box on the image. You may further filter the result using the confidence score reported by the detector. Some filtering algorithms, such as non-maximum suppression, may be appropriate but are not discussed here. The following is the output:

Bounding box as produced by the people detector using HOG in OpenCV

You can see such detectors can find people only if the full body is visible. The output has false positives (non-people detected) and false negatives (people not detected). Using it to count all people in a crowd scene would be challenging. But it is a good start to see how easily you can get something done using OpenCV.

Unfortunately, there are not any detectors that come with OpenCV other than people. But you can train your own SVM or other models using the HOG as feature vectors. Facilitating a machine learning model is the key point of extracting feature vectors from an image.

Want to Get Started With Machine Learning with OpenCV?

Take my free email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Summary

In this tutorial, you learned how to use HOG in OpenCV to extract feature vectors based on a sliding window. It is an effective approach to finding features that can help object detection.

Specifically, you learned:

How to fetch HOG features from an image
How to use the built-in HOG people detector from OpenCV

In case you have any questions, please leave a comment below.

Get Started on Machine Learning in OpenCV!

Learn how to use machine learning techniques in image processing projects

...using OpenCV in advanced ways and work beyond pixels

Discover how in my new Ebook:
Machine Learing in OpenCV

It provides self-study tutorials with all working code in Python to turn you from a novice to expert. It equips you with
logistic regression, random forest, SVM, k-means clustering, neural networks, and much more...all using the machine learning module in OpenCV

Kick-start your deep learning journey with hands-on exercises

See What's Inside

Navigation

Extracting Histogram of Gradients with OpenCV

Overview

Understanding HOG

Computing HOG in OpenCV

Using HOG for People Detection

Want to Get Started With Machine Learning with OpenCV?

Further Reading

Books

Websites

Summary

Get Started on Machine Learning in OpenCV!

Learn how to use machine learning techniques in image processing projects

Kick-start your deep learning journey with hands-on exercises

More On This Topic

No comments yet.

Leave a Reply Click here to cancel reply.