SALE! Use code blackfriday for 40% off everything!
Hurry, sale ends soon! Click to see the full catalog.

K-Means Clustering in OpenCV and Application for Color Quantization

The k-means clustering algorithm is an unsupervised machine learning technique that seeks to group similar data into distinct clusters, with the aim of uncovering patterns in the data that may not be apparent to the naked eye. 

It is possibly the most widely known algorithm for data clustering, and it comes implemented in the OpenCV library.

In this tutorial, you are going to learn how to apply OpenCV’s k-means clustering algorithm for color quantization of images. 

After completing this tutorial, you will know:

  • What data clustering is within the context of machine learning. 
  • How to apply the k-means clustering algorithm in OpenCV to a simple two-dimensional dataset containing distinct data clusters.
  • How to apply the k-means clustering algorithm in OpenCV for color quantization of images. 

Let’s get started. 

K-Means Clustering for Color Quantization Using OpenCV
Photo by Billy Huynh, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  • Clustering as an Unsupervised Machine Learning Task
  • Discovering k-Means Clustering in OpenCV
  • Color Quantization Using k-Means

Clustering as an Unsupervised Machine Learning Task

Cluster analysis is an unsupervised learning technique. 

It involves the automatic grouping of data into distinct groups (or clusters), where the data within each cluster are similar to one another but different from those in the other clusters. Its aim is to uncover patterns in the data that may not be apparent before clustering. 

There are many different types of clustering algorithms, as has been explained in this tutorial, with k-means clustering being one of the most widely known. 

The k-means clustering algorithm takes unlabelled data points and seeks to assign them to k  clusters, where each data point belongs to the cluster with the nearest cluster center, and the center of each cluster is taken as the mean of the data points that belong to it. The algorithm requires that the value of k is provided by the user as an input and, hence, this value needs to be known a priori or tuned according to the data. 

Discovering k-Means Clustering in OpenCV

Let’s first consider applying k-means clustering to a simple two-dimensional dataset containing distinct data clusters, before moving on to more complex tasks. 

For this purpose, we shall be generating a dataset consisting of 100 data points (specified by n_samples), which are equally divided into 5 Gaussian clusters (specified by centers) having a standard deviation set to 1.5 (specified by cluster_std). In order to be able to replicate the results, let’s also define a value for random_state, which we’re going to set to 10:

The code above should generate the following plot of data points:

Scatter Plot of Dataset Consisting of 5 Gaussian Clusters

If we have a good look at this plot, we may already be able to visually distinguish one cluster from another, which means that this should be a sufficiently straightforward task for the k-means clustering algorithm.

In OpenCV, the k-means algorithm is not part of the ml module but can be called directly. To be able to use it, we need to specify values for its input arguments as follows:

  • The input, unlabelled data.
  • The number, K, of required clusters. 
  • The termination criteria, TERM_CRITERIA_EPS and TERM_CRITERIA_MAX_ITER, defining the desired accuracy and the maximum number of iterations, respectively, which when reached, the algorithm iteration should stop. 
  • The number of attempts, denoting the number of times that the algorithm will be executed with different initial labelling, in an attempt to find the best cluster compactness. 
  • The manner by which the cluster centers will be initialised, whether random, user-supplied, or through a center initialization method such as kmeans++, as specified by the parameter flags.

The k-means clustering algorithm in OpenCV returns:

  • The compactness of each cluster, computed as the sum of squared distance of each data point to its corresponding cluster center. A smaller compactness value indicates that the data points are distributed closer to their corresponding cluster center and, hence, that the cluster is more compact.
  • The predicted cluster labels, y_pred, which associate each input data point to its corresponding cluster. 
  • The centers coordinates of each cluster of data points. 

Let’s now apply the k-means clustering algorithm to the dataset generated earlier. Note that we are type casting the input data to float32, as expected by the kmeans() function in OpenCV:

The code above generates the following plot, where each data point is now colored according to its assigned cluster, and the cluster centers are marked in red:

Scatter Plot of Dataset With Clusters Identified Using k-Means Clustering

The complete code listing is as follows:

Color Quantization Using k-Means

One of the applications for k-means clustering is that of color quantization of images. 

Color quantization refers to the process of reducing the number of distinct colors that are used in the representation of an image. 

Color quantization is critical for displaying images with many colors on devices that can only display a limited number of colors, usually due to memory limitations, and enables efficient compression of certain types of images.

Color quantization, 2023.

In this case, the data points that we will be providing to the k-means clustering algorithm are the RGB values of each of the image pixels. As we shall be seeing, we will be providing these values in the form of an $M \times 3$ array, where $M$ denotes the number of pixels in the image. 

Let’s try out the k-means clustering algorithm on this image, which I have named bricks.jpg:

The dominant colours that stand out in this image are red, orange, yellow, green and blue. However, there are many shadows and glints that introduce additional shades and colours to the dominant ones. 

We’ll start by first reading the image using OpenCV’s imread function. 

Remember that OpenCV loads this image in BGR rather than RGB order. There is no need to convert it to RGB prior to feeding it to the k-means clustering algorithm, because the latter will still group similar colours together no matter in which order the pixel values are specified. However, since we are making use of Matplotlib to display the images, we’ll convert it to RGB so that we may display the quantized result correctly later on:

As we have mentioned earlier, the next step involves reshaping the image to an $M \times 3$ array, and we may then proceed to apply k-means clustering to the resulting array values using a number of clusters that corresponds to the number of dominant colours that we have mentioned above. 

In the code snippet below, I have also included a line that prints out the number of unique RGB pixel values out of the total number of pixels in the image. We find that we have 338,742 unique RGB values out of 14,155,776 pixels, which is substantial:

At this point, we shall proceed to apply the actual RGB values of the cluster centers to the predicted pixel labels, and reshape back the resulting array to the shape of the original image before displaying it:

Printing again the number of unique RGB values in the quantized image, we find that these have now lessened to the number of clusters that we had specified to the k-means algorithm:

If we have a look at the color quantized image, we find that the pixels belonging to the yellow and orange bricks have been grouped into the same cluster, possibly due to the similarity of their RGB values, whereas one of the clusters aggregates pixels belonging to regions of shadow:

Color Quantized Image Using k-Means Clustering with 5 Clusters

Now try changing the value specifying the number of clusters for the k-means clustering algorithm and investigate its effect on the quantization result. 

The complete code listing is as follows:

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Books

Websites

Summary

In this tutorial, you learned how to apply OpenCV’s k-means clustering algorithm for color quantization of images.

Specifically, you learned:

  • What data clustering is within the context of machine learning. 
  • How to apply the k-means clustering algorithm in OpenCV to a simple two-dimensional dataset containing distinct data clusters.
  • How to apply the k-means clustering algorithm in OpenCV for color quantization of images. 

Do you have any questions?

Ask your questions in the comments below, and I will do my best to answer.

, , ,

5 Responses to K-Means Clustering in OpenCV and Application for Color Quantization

  1. Avatar
    Mohammad October 29, 2023 at 12:05 am #

    Thanks for the great articles.

    Just one question:

    Does K-Means Clustering with only one center equal averaging all pixel values?

    For example if I want to calculate the main color of a car, should I average all of the car’s image’s pixel values to get the main color or should I do a K-Means clustering with a cluster of size one on all of the pixels?

    Thanks

    • Avatar
      James Carmichael October 29, 2023 at 9:03 am #

      Hi Mohammad…Your approach is reasonable! Proceed with your model and let us know what you find!

  2. Avatar
    David A. Oluyori November 4, 2023 at 7:38 am #

    Jason, you are the best. This is a wonderful article. I enjoyed it.

    • Avatar
      James Carmichael November 4, 2023 at 8:08 am #

      Thank you for your feedback and support David! We greatly apprecite it! Let us know if you ever have any questions regarding our content.

  3. Avatar
    Xhelas November 4, 2023 at 8:28 am #

    Hi Mohammad. IMHO, if you see a car in an image, it is because there is the car AND was is not the car, let’s call this a background. Necessarily, the background as a different color than the car since otherwise the car would be invisible. So try a k-means with at least 2 color so that one of the group can indicate the pixels of the car. Then you can average the colors of these pixels.

Leave a Reply