SALE! Use code blackfriday for 40% off everything!
Hurry, sale ends soon! Click to see the full catalog.

# K-Nearest Neighbors Classification Using OpenCV

The OpenCV library comes with a module that implements the k-Nearest Neighbors algorithm for machine learning applications.

In this tutorial, you are going to learn how to apply OpenCV’s k-Nearest Neighbors algorithm for the task of classifying handwritten digits.

After completing this tutorial, you will know:

• Several of the most important characteristics of the k-Nearest Neighbors algorithm.
• How to use the k-Nearest Neighbors algorithm for image classification in OpenCV.

Let’s get started.

## Tutorial Overview

This tutorial is divided into two parts; they are:

• Reminder of How the k-Nearest Neighbors Algorithm Works
• Using k-Nearest Neighbors for Image Classification in OpenCV

## Prerequisites

For this tutorial, we assume that you are already familiar with:

## Reminder of How the k-Nearest Neighbors Algorithm Works

The k-Nearest Neighbors (kNN) algorithm has already been explained well in this tutorial by Jason Brownlee, but let’s first start with brushing up some of the most important points from his tutorial:

• The kNN algorithm does not actually involve any learning. It simply stores and uses the entire training dataset as its model representation. For this reason, kNN is also referred to as a lazy learning algorithm.
• Given that the entire training dataset is stored, it would make sense that this dataset is kept curated, updated often with new data, and as free as possible from outliers.
• A prediction for a new instance is made by searching through the entire training dataset for the most similar instance, based on a distance measure of choice. The choice of distance measure is typically based on the properties of the data.
• If the kNN is used to solve a regression problem, then the mean or the median of the k-most similar instances is typically used to generate a prediction.
• If the kNN is used to solve a classification problem, a prediction can be generated from the class with the highest frequency of k-most similar instances.
• A value for k can be tuned by trying out different values and seeing what works best for the problem at hand.
• The computational cost of the kNN algorithm increases with the size of the training dataset. The kNN algorithm also struggles as the dimensionality of the input data increases.

## Using k-Nearest Neighbors for Image Classification in OpenCV

In this tutorial we will be considering the application of classifying handwritten digits.

We have already seen in a previous tutorial that OpenCV provides the image, digits.png, that is composed of a ‘collage’ of 5,000 20$\times$20 pixel sub-images, where each sub-image features a handwritten digit from 0 to 9.

We have also seen how to convert dataset images into feature vector representations before feeding them into a machine learning algorithm.

What we shall be doing here is to split OpenCV’s digits dataset into training and testing sets, converting them into feature vectors, and then using these feature vectors to train and test a kNN classifier to classify handwritten digits.

Note: We have previously mentioned that the kNN algorithm does not actually involve any training/learning, but we shall be referring to a training dataset as a way of distinguishing the images that will be used for the model representation from those that will be later used for testing.

Let’s start by loading OpenCV’s digits image, splitting it into it training and testing sets of images, and converting them into feature vectors using the Histogram of Oriented Gradients (HOG) technique:

Next, we’re going to initiate a kNN classifier:

Then ‘train’ it on the training split of the dataset. For the training split of the dataset, we may either use the intensity values of the image pixels themselves (type casted to 32-bit floating-point values, according to the expected input of the function):

Or use the features vectors generated by the HOG technique. In the previous section we have, in fact, mentioned that the kNN algorithm struggles with high-dimensionality data. Using the HOG technique to generate a more compact representation of the image data helps with alleviating this problem:

Let’s continue this tutorial by making use of the HOG feature vectors.

The trained kNN classifier can now be tested on the testing split of the dataset, following which its accuracy can be computed by working out the percentage of correct predictions that match the ground truth. For the time being, the value for k will be empirically set to 3:

However, as we have mentioned in the previous section, it is typical practice that the value of k is tuned by trying out different values and seeing what works best for the problem at hand. We can also try splitting the dataset using different ratio values, to see what their effect on the prediction accuracy is.

In order to do so, we’ll place the kNN classifier code above into a nested for loop, where the outer loop iterates over different ratio values, whereas the inner loop iterates over different values of k. Inside the inner loop, we shall also populate a dictionary with the computed accuracy values, so that we may later plot them using Matplotlib.

One last detail that we will include is a check to make sure that we are loading the correct image, and that we are correctly splitting it into sub-images. For this purpose we’ll make use of OpenCV’s imshow method to display the images, followed by a waitKey with an input of zero that will stop and wait for a keyboard event:

Plotting the computed prediction accuracy for different ratio values and different values of k, gives a better insight into the effect that these different values have on the prediction accuracy for this particular application: Line plots of the prediction accuracy for different training splits of the dataset, and different values of ‘k’

Try using different image descriptors and tweaking the different parameters for the algorithms of choice before feeding the data into the kNN algorithm, and investigate the kNN’s outputs that result from your changes.

This section provides more resources on the topic if you are looking to go deeper.

## Summary

In this tutorial, you learned how to apply OpenCV’s k-Nearest Neighbors algorithm for the task of classifying handwritten digits.

Specifically, you learned:

• Several of the most important characteristics of the k-Nearest Neighbors algorithm.
• How to use the k-Nearest Neighbors algorithm for image classification in OpenCV.

Do you have any questions?