Besides the feature descriptor generated by SIFT, SURF, and ORB, as in the previous post, the Histogram of Oriented Gradients (HOG) is another feature descriptor you can obtain using OpenCV. HOG is a robust feature descriptor widely used in computer vision and image processing for object detection and recognition tasks. It captures the distribution of gradient orientations in an image and provides a powerful representation invariant to changes in illumination and shadowing.
In this post, you will learn about HOG. Specifically, you will know:
- What is HOG, and how is it related to an image
- How to compute it in OpenCV
Kick-start your project with my book Machine Learning in OpenCV. It provides self-study tutorials with working code.
Let’s get started.
Overview
This post is divided into two parts; they are:
- Understanding HOG
- Computing HOG in OpenCV
- Using HOg for People Detection
Understanding HOG
The concept behind the HOG algorithm is to compute the distribution of gradient orientations in localized portions of an image. HOG operates on a window, which is a region of fixed pixel size on the image. A window is divided into small spatial regions, known as a block, and a block is further divided into multiple cells. HOG calculates the gradient magnitude and orientation within each cell, and creates a histogram of gradient orientations. Then the histograms within the same block are concatenated.
Gradient measures how a pixel’s color intensity compares to its neighbors. The more drastic it changes, the higher the magnitude. The orientation tells which direction is the steepest gradient. Usually, this is applied on a single-channel image (i.e., grayscale), and each pixel can have its own gradient. HOG gathers all gradients from a block and puts them into a histogram.
The clever way of making a histogram in HOG is that the bins in a histogram are determined by the angle, but the value is interpolated between the closest bins. For example, if the bins are assigned values 0, 20, 40, and so on while the gradient was 10 at angle 30, a value of 5 was added to bins of 20 and 40. This way, HOG can effectively capture the texture and shape of objects within the image.
HOG is particularly effective for detecting objects with distinguishable textures and patterns, making it a popular choice for tasks such as pedestrian detection and other forms of object recognition. With its ability to capture the distribution of gradient orientations, HOG provides a robust representation invariant to variations in lighting conditions and shadows.
Computing HOG in OpenCV
OpenCV provides a straightforward method to compute the HOG descriptor, making it easily accessible for developers and researchers. Let’s take a look at a basic example of how to compute HOG in OpenCV:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
import cv2 # Load the image and convert to grayscale img = cv2.imread('image.jpg') gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # define each block as 4x4 cells of 64x64 pixels each cell_size = (128, 128) # h x w in pixels block_size = (4, 4) # h x w in cells win_size = (8, 6) # h x w in cells nbins = 9 # number of orientation bins img_size = img.shape[:2] # h x w in pixels # create a HOG object hog = cv2.HOGDescriptor( _winSize=(win_size[1] * cell_size[1], win_size[0] * cell_size[0]), _blockSize=(block_size[1] * cell_size[1], block_size[0] * cell_size[0]), _blockStride=(cell_size[1], cell_size[0]), _cellSize=(cell_size[1], cell_size[0]), _nbins=nbins ) n_cells = (img_size[0] // cell_size[0], img_size[1] // cell_size[1]) # find features as a 1xN vector, then reshape into spatial hierarchy hog_feats = hog.compute(img) hog_feats = hog_feats.reshape( n_cells[1] - win_size[1] + 1, n_cells[0] - win_size[0] + 1, win_size[1] - block_size[1] + 1, win_size[0] - block_size[0] + 1, block_size[1], block_size[0], nbins) print(hog_feats.shape) |
HOG computes features for one window at a time. There are multiple blocks in a window. In a block, there are multiple “cells”. See the following illustration:
Each cell is of a fixed size. In the above, you used 64×64 pixels in a cell. Each block has an equal number of cells. In the above, you used 4×4 cells in a block. Also, there is equal number of cells in a window; you used 8×6 cells above. However, we are not dividing an image into blocks or windows when we compute HOG. But instead,
- Consider a window as a sliding window on the image, in which the sliding window’s stride size is the size of one cell, i.e., it slides across one cell at a time
- We divide the window into cells of fixed size
- We set up the second sliding window that matches the block size and scan the window. It slides across one cell at a time
- Within a block, HOG is computed from each cell
The returned HOG is a vector for the entire image. In the code above, you reshaped it to make it clear the hierarchy of windows, blocks, cells, and histogram bins. For example, hog_feats[i][j]
corresponds to the window (in numpy slicing syntax):
1 2 |
img[n_cells[1]*i : n_cells[1]*i+(n_cells[1]*win_size[1]), n_cells[0]*j : n_cells[0]*j+(n_cells[0]*win_size[0])] |
Or, equivalently, the window with the cell (i,j) at the top left corner.
A sliding window is a common technique in object detection because you cannot be sure a particular object lies exactly in a grid cell. Making smaller cells but larger windows is a better way to catch the object than just seeing a part of it. However, there’s a limitation: An object larger than the window will be missed. Also, an object too small may be dwarfed by other elements in the window.
Usually, you have some downstream tasks associated with HOG, such as running an SVM classifier on the HOG features for object detection. In this case, you may want to reshape the HOG output into vectors of the entire block rather than in the hierarchy of each cell like above.
Using HOG for People Detection
The feature extraction technique in the code above is useful if you want to get the raw feature vectors for other purposes. But for some common tasks, OpenCV comes with pre-trained machine learning models for your disposal without much effort.
Let’s consider the photo from the following URL (save it as people.jpg
):
This is a picture of people crossing a street. OpenCV has a “people detector” in HOG that was trained on a 64×128 pixel window size. Using it to detect people in a photo is surprisingly simple:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
import cv2 # Load the image and convert it to grayscale img = cv2.imread('people.jpg') hog = cv2.HOGDescriptor() hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector()) # Detect people in the image locations, confidence = hog.detectMultiScale(img) # Draw rectangles around the detected people for (x, y, w, h) in locations: cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 5) # Display the image with detected people cv2.imshow('People', img) cv2.waitKey(0) cv2.destroyAllWindows() |
In the above, you created a HOG descriptor with the parameters from cv2.HOGDescriptor_getDefaultPeopleDetector() will initialize
an SVM classifier to detect a particular object, which in this case is people.
You call the descriptor on an image and run the SVM in one pipeline using hog.detectMultiScale(img)
, which returns the bounding boxes for each object detected. While the window size is fixed, this detection function will resize the image in multiple scales to find the best detection result. Even so, the bounding boxes returned are not tight. The code above also annotates the people detected by marking the bounding box on the image. You may further filter the result using the confidence score reported by the detector. Some filtering algorithms, such as non-maximum suppression, may be appropriate but are not discussed here. The following is the output:
You can see such detectors can find people only if the full body is visible. The output has false positives (non-people detected) and false negatives (people not detected). Using it to count all people in a crowd scene would be challenging. But it is a good start to see how easily you can get something done using OpenCV.
Unfortunately, there are not any detectors that come with OpenCV other than people. But you can train your own SVM or other models using the HOG as feature vectors. Facilitating a machine learning model is the key point of extracting feature vectors from an image.
Want to Get Started With Machine Learning with OpenCV?
Take my free email crash course now (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Further Reading
This section provides more resources on the topic if you are looking to go deeper.
Books
Websites
- OpenCV, https://opencv.org/
- StackOverflow: OpenCV HOG Features Explanation: https://stackoverflow.com/questions/44972099/opencv-hog-features-explanation
Summary
In this tutorial, you learned how to use HOG in OpenCV to extract feature vectors based on a sliding window. It is an effective approach to finding features that can help object detection.
Specifically, you learned:
- How to fetch HOG features from an image
- How to use the built-in HOG people detector from OpenCV
In case you have any questions, please leave a comment below.
No comments yet.