Learning Vector Quantization for Machine Learning

A downside of K-Nearest Neighbors is that you need to hang on to your entire training dataset.

The Learning Vector Quantization algorithm (or LVQ for short) is an artificial neural network algorithm that lets you choose how many training instances to hang onto and learns exactly what those instances should look like.

In this post you will discover the Learning Vector Quantization algorithm. After reading this post you will know:

  • The representation used by the LVQ algorithm that you actually save to a file.
  • The procedure that you can use to make predictions with a learned LVQ model.
  • How to learn an LVQ model from training data.
  • The data preparation to use to get the best performance from the LVQ algorithm.
  • Where to look for more information on LVQ.

This post was written for developers and assumes no background in statistics or mathematics. The post focuses on how the algorithm works and how to use it for predictive modeling problems.

If you have any questions on LVQ, leave a comment and I will do my best to answer.

Kick-start your project with my new book Master Machine Learning Algorithms, including step-by-step tutorials and the Excel Spreadsheet files for all examples.

Let’s get started.

Learning Vector Quantization for Machine Learning

Learning Vector Quantization for Machine Learning
Photo by Holly Victoria Norval, some rights reserved.

LVQ Model Representation

The representation for LVQ is a collection of codebook vectors.

LVQ was developed and is best understood as a classification algorithm. It supports both binary (two-class) and multi-class classification problems.

A codebook vector is a list of numbers that have the same input and output attributes as your training data. For example, if your problem is a binary classification with classes 0 and 1, and the inputs width, length height, then a codebook vector would be comprised of all four attributes: width, length, height and class.

The model representation is a fixed pool of codebook vectors, learned from the training data. They look like training instances, but the values of each attribute have been adapted based on the learning procedure.

In the language of neural networks, each codebook vector may be called a neuron, each attribute on a codebook vector is called a weight and the collection of codebook vectors is called a network.

Get your FREE Algorithms Mind Map

Machine Learning Algorithms Mind Map

Sample of the handy machine learning algorithms mind map.

I've created a handy mind map of 60+ algorithms organized by type.

Download it, print it and use it. 


Also get exclusive access to the machine learning algorithms email mini-course.

 

 

Making Predictions with an LVQ Model

Predictions are made using the LVQ codebook vectors in the same way as K-Nearest Neighbors.

Predictions are made for a new instance (x) by searching through all codebook vectors for the K most similar instances and summarizing the output variable for those K instances. For classification this is the mode (or most common) class value.

Typically predictions are made with K=1, and the codebook vector that matches is called the Best Matching Unit (BMU).

To determine which of the K instances in the training dataset are most similar to a new input a distance measure is used. For real-valued input variables, the most popular distance measure is Euclidean distance. Euclidean distance is calculated as the square root of the sum of the squared differences between a new point (x) and an existing point (xi) for each attribute j.

EuclideanDistance(x, xi) = sqrt( sum( (xj – xij)^2 ) )

Learning an LVQ Model From Data

The LVQ algorithm learns the codebook vectors from the training data.

You must choose the number of codebook vectors to use, such as 20 or 40. You can find the best number of codebook vectors to use by testing different configurations on your training dataset.

The learning algorithm starts with a pool of random codebook vectors. These could be randomly selected instances from the training data, or randomly generated vectors with the same scale as the training data. Codebook vectors have the same number of input attributes as the training data. They also have an output class variable.

The instances in the training dataset are processed one at a time. For a given training instance, the most similar codebook vector is selected from the pool.

If the codebook vector has the same output as the training instance, the codebook vector is moved closer to the training instance. If it does not match, it is moved further away. The amount that the vector is moved is controlled by an algorithm parameter called the learning_rate.

For example, the input variable (x) of a codebook vector is moved closer to the training input value (t) by the amount in the learning_rate if the classes match as follows:

x = x + learning_rate * (t – x)

The opposite case of moving the input variables of a codebook variable away from a training instance is calculated as:

x = x – learning_rate * (t – x)

This would be repeated for each input variable.

Because one codebook vector is selected for modification for each training instance the algorithm is referred to as a winner-take-all algorithm or a type of competitive learning.

This process is repeated for each instance in the training dataset. One iteration of the training dataset is called an epoch. The process is completed for a number of epochs that you must choose (max_epoch), such as 200.

You must also choose an initial learning rate (such as alpha=0.3). The learning rate is decreased with the epoch, starting at the large value you specify at epoch 1 which makes the most change to the codebook vectors and finishing with a small value near zero on the last epoch, making very minor changes to the codebook vectors.

The learning rate for each epoch is calculated as:

learning_rate = alpha * (1 – (epoch/max_epoch))

Where learning_rate is the learning rate for the current epoch (0 to max_epoch-1), alpha is the learning rate specified to the algorithm at the start of the training run and max_epoch is the total number of epochs to run the algorithm also specified at the start of the run.

The intuition for the learning process is that the pool of codebook vectors is a compression of the training dataset to the points that best characterize the separation of the classes.

Data Preparation for LVQ

Generally, it is a good idea to prepare data for LVQ in the same way as you would prepare it for K-Nearest Neighbors.

  • Classification: LVQ is a classification algorithm that works for both binary (two-class) and multi-class classification algorithms. The technique has been adapted for regression.
  • Multiple-Passes: Good technique with LVQ involves performing multiple passes of the training dataset over the codebook vectors (e.g. multiple learning runs). The first with a higher learning rate to settle the pool codebook vectors and the second run with a small learning rate to fine tune the vectors.
  • Multiple Best Matches: Extensions of LVQ select multiple best matching units to modify during learning, such as one of the same class and one of a different class which are drawn toward and away from a training sample respectively. Other extensions use a custom learning rate for each codebook vector. These extensions can improve the learning process.
  • Normalize Inputs: Traditionally, inputs are normalized (rescaled) to values between 0 and 1. This is to avoid one attribute from dominating the distance measure. If the input data is normalized, then the initial values for the codebook vectors can be selected as random values between 0 and 1.
  • Feature Selection: Feature selection that can reduce the dimensionality of the input variables can improve the accuracy of the method. LVQ suffers from the same curse of dimensionality in making predictions as K-Nearest Neighbors.

LVQ Tutorial in Python

For a step-by-step tutorial on implementing LVQ from scratch in Python, see the post;

Further Reading

The technique was developed by Kohonen who wrote the seminal book on LVQ and the sister method Self-Organizing Maps called: Self-Organizing Maps.

Self-Organizing Maps

Self-Organizing Maps

I highly recommend this book if you are interested in LVQ.

Summary

In this post you discovered the LVQ algorithm. You learned:

  • The representation for LVQ is a small pool of codebook vectors, smaller than the training dataset.
  • The codebook vectors are used to make predictions using the same technique as K-Nearest Neighbors.
  • The codebook vectors are learned from the training dataset by moving them closer when they are good match and further away when they are a bad match.
  • The codebook vectors are a compression of the training data to best separate the classes.
  • Data preparation traditionally involves normalizing the input values to the range between 0 and 1.

Do you have any questions about this post or the LVQ algorithm? Leave a comment and ask your question and I will do my best to answer it.

Discover How Machine Learning Algorithms Work!

Mater Machine Learning Algorithms

See How Algorithms Work in Minutes

...with just arithmetic and simple examples

Discover how in my new Ebook:
Master Machine Learning Algorithms

It covers explanations and examples of 10 top algorithms, like:
Linear Regression, k-Nearest Neighbors, Support Vector Machines and much more...

Finally, Pull Back the Curtain on
Machine Learning Algorithms

Skip the Academics. Just Results.

See What's Inside

45 Responses to Learning Vector Quantization for Machine Learning

  1. Avatar
    aloo masala October 18, 2016 at 5:29 pm #

    Hi,

    nice and readable post. If I am correct, then the update rule should use ‘learning_rate’ instead of the initial value ‘alpha’. In the current form ‘learning_rate’ is decreased but never applied. Or replace ‘learning_rate’ by ‘alpha’ and introduce something like ‘alpha0’ in the update formula of the learning rate.

    Please delete or do not publish this post.

    Best

    aloo

    • Avatar
      Jason Brownlee October 19, 2016 at 9:17 am #

      Alpha is a fixed parameter and learning rate is what is used to update the weights.

      • Avatar
        aloo masala October 19, 2016 at 7:01 pm #

        The update rule of codebook vector x uses the fixed parameter alpha rather than the adaptive learning_rate parameter.

        • Avatar
          Jason Brownlee November 7, 2016 at 7:28 am #

          Thanks aloo masala, I have updated the example.

  2. Avatar
    aloo masala November 19, 2016 at 7:52 am #

    You are welcome. Now you can delete my comments.

  3. Avatar
    Tom Anderson November 29, 2016 at 6:30 pm #

    Kohonen has recently self-published a book on this topic. “MATLAB Implementations and Applications of the Self-Organizing Map”

    Although its examples are using MATLAB, I think this may be a good option for self-study, since it’s free!
    http://docs.unigrafia.fi/publications/kohonen_teuvo/

  4. Avatar
    Declan Lange January 14, 2017 at 6:27 pm #

    I’ve got a network trained with weights being initialised (this is in Java), what’s the best way to feed in a new data set/vector and find the best matching dataset to it?

    Thanks,
    Declan

    • Avatar
      Jason Brownlee January 15, 2017 at 5:29 am #

      Sorry, I don’t understand your question Declan. Perhaps you could restate it?

      • Avatar
        Declan Lange January 15, 2017 at 2:27 pm #

        I have a network with training sets that have a vector and a name (classification name) in it, i’m trying to get a new vector, that has no name, and determine what it’s closest to.

        I’m trying to make a prediction using a new data set to predict what it’s most similar to.

  5. Avatar
    xenomorph February 14, 2017 at 11:28 am #

    Do people really use LVQ and locally weighted learning in practice, in 2017?

    • Avatar
      Jason Brownlee February 15, 2017 at 11:31 am #

      I like to throw LVQ in the mix when spot checking algorithms on a problem and go with it if it products good results or results similar to a kNN.

  6. Avatar
    Ben Cryer May 18, 2018 at 9:07 pm #

    Best Machine Learning Classification Algorithms using MATLAB tutorial

    https://www.simpliv.com/developmenttool/machine-learning-classification-algorithms-using-matlab

  7. Avatar
    Geordie June 10, 2018 at 11:37 am #

    Hi there! Let’s assume that we are encoding to a collection of K codebook vectors (for example K=512). We train the system on our data and then want to use it.

    Is it correct that the total number of possible outputs of the network is K? IE if I train the network on 100,000 128x128x3 images with K=512, after training ANY 128x128x3 image will get mapped to one of 512 outputs?

    • Avatar
      Jason Brownlee June 11, 2018 at 6:05 am #

      Not sure I follow. The output of the model is a summary of the BMUs.

  8. Avatar
    Valentin June 13, 2018 at 4:50 am #

    what happens if the number of prototypes(codebooks) per class is equal to the number of data points we have ?

    • Avatar
      Jason Brownlee June 13, 2018 at 6:21 am #

      In that case only use LVQ if it outperforms kNN. It may.

  9. Avatar
    MAK June 26, 2018 at 6:40 am #

    Hii,
    Thanks for your incredible site.
    Like you say in your post KNN ” need to hang on to your entire training dataset”
    I want to use LVQ , but can I use LVQ as unsupervised algorithm (or one class classification) ?
    I have dataset with normal behavior , and I want to predict when new item is abnormal .
    Thanks,

    MAK

    • Avatar
      Jason Brownlee June 26, 2018 at 6:44 am #

      Perhaps try it.

      • Avatar
        MAK June 26, 2018 at 7:11 am #

        Hii,
        Can you give me a hint….
        If I have codebook vectors from the same class , how LVQ will work ?
        According to your explain LVQ can be only binary/multi-class .
        10x

        • Avatar
          Jason Brownlee June 26, 2018 at 2:26 pm #

          I do not have an example, sorry. Perhaps you can use a threshold of a distance measure for an example from the codebook vectors, as you would in a KNN model for one-class prediction.

  10. Avatar
    Giant July 24, 2018 at 11:14 pm #

    Thank you so much.
    I have a question:
    After the updating the codebook (i.e., x = x + learning_rate * (t – x)), some elements in the codebook
    may not belong to the training data.
    How to define the classes for those elements?

    • Avatar
      Jason Brownlee July 25, 2018 at 6:18 am #

      Classes are assigned to new instances by finding the BMU and using it’s class.

      • Avatar
        Giant July 26, 2018 at 2:53 am #

        Thank you so much for the kindly response.

        For each element in the updated codebook, if its class is determined by
        the class of its BMU, it is possible that the class of the element changes
        along the iteration, right?
        If it is right, the final codebook may be class-inbalanced.

  11. Avatar
    J July 26, 2018 at 11:26 am #

    Hi Jason, thanks a lot for your great, readable summary! I have three quick follow-up questions and would be glad about your opinion on those. Also happy about any pointers to other references if this helps answer the question.

    1. Can LVQ output a probability instead of a class? E.g. if I have two classes (coded as 0 vs. 1), could LVQ output probabilities like 0.30 or 0.60 instead of just 0s and 1s? I am asking because I tried using an LVQ implementation in R but it seems like it can just provide the class itself as output with the lvqtest function (https://www.rdocumentation.org/packages/class/versions/7.3-14/topics/lvq3). So, I am wondering if this is just because of the implementation in this R package or if LVQ cannot provide probabilities in general.

    2. Would you say that LVQ is a suitable algorithm for data sets with >= 100 predictor variables and >= 500.000 observations? I have the feeling that this is not the case as I got stuck with LVQ in R (using the before mentioned implementation) with a data set of this size. Also, it seems like LVQ somehow still has similarities to KNN which is definitely not suitable for data sets of this size.

    3. You said that the input values should be normalised to a range between 0 and 1. Is it enough to just normalise the values to get a mean of 0 and a sd of 1? (Basically, I am asking if it is enough to stop after step 3 of the description by Stephen Joy in the first answer of this forum post: https://www.researchgate.net/post/How_do_i_normalize_data_from_0_to_1_range). If the LVQ implementation that I am using does not specify that values in the range between 0 and 1 are required, this should be okay right?

    Thanks a lot in advance

    • Avatar
      Jason Brownlee July 26, 2018 at 2:24 pm #

      Not directly or natively, you may be able to use methods from SVM to convert the output into a probability.

      LVQ like kNN will suffer from the curse of dimensionality. Lots of features may be problematic.

      Standardizing is fine.

  12. Avatar
    Bruno Barre April 18, 2019 at 2:13 am #

    Hi!

    First thanks a lot for your nice article and very useful site in general 🙂 !

    I’m implementing SOM (for clustering/visualisation), and LVQ (sort of SOM classifier) in Python.
    And I wonder if there exists any method to select/choose the number of neurons at the beginning.

    For example, if we take the Iris datset, we have three classes of different flowers.
    So, if I’m understanding well (I’ve read lots of blogs and article, including the many of Kohonen), for a LVQ model, we choose a number of neurons, let’s say 10, and give them a class. And the LVQ model is a one dimensional (the grid of neurons is flat, unlike in SOM that could be in 2-d, 3-d…).

    So for example, we give the first three neurons the class setosa, then the versicolor to the let’s say 4th and 5th neurons, and finally the rest is virginica.

    Then we initialise the vectors of the neurons, by randomness, or by taking let’s say a gaussian with mean and standard-deviation of the class the neurons was attributed to.

    Then we train the model.

    AM I kind of correct? Because it seems weird to me that we need to attribute the class to the neurons BEFORE the training..

    Some articles say that clustering models, like k-means, dbscan or even SOM, could be applied before, but I didn’t really understant how?

    One person on a forum even say, that for classify with a SOM model, we could create as many SOM as we have classes, so with Iris dateset, three SOM. Then train them separately, one only on only one class. SO there would be one SOM for setose, one for virginica…
    Then to classify a new instance, we look at the BMU of all the SOM, and predict the class of the BMU. So it’s kind of a Best Matching Map and no longer a Best Matching Unit… I tried it, results were similar to random forest with Iris, but with bigger and more complicated dataset, it’s clearly not…

    I will put my “work”, implentation of SOM and LVQ in github, because I didn’t find any really good implementation yet… (but for weirdly for neural gas and growing neural gas, there is more stuff..)

    But I would have like to get some of your useful advice before if you have time 😉

    Thanks again!

  13. Avatar
    Sudarsan August 22, 2019 at 2:02 am #

    Isnt LVQ a self organizing map! Here a data point is choose random from training pts for each class and that pt is moved to center of its class. Is my observation right?

    • Avatar
      Jason Brownlee August 22, 2019 at 6:30 am #

      No, the nodes are not connected as they are in a SOM.

  14. Avatar
    Dinithi September 4, 2020 at 2:52 am #

    Hi I have a question. In the sub topic “Making Predictions with an LVQ Model” what is meant by K? And why do we usually take K as 1? Response is highly appreciated!

    • Avatar
      Jason Brownlee September 4, 2020 at 6:33 am #

      K is the number of best matching units for a new example.

  15. Avatar
    Ranji Raj January 22, 2021 at 12:21 am #

    Can LVQ be extended for clustering approaches as well?

    • Avatar
      Jason Brownlee January 22, 2021 at 7:22 am #

      You mean kohonen map/SOM? It’s a different but related method.

    • Avatar
      Supriya June 12, 2022 at 7:17 pm #

      How target values are used while training LVQ?
      How code vectors are obtained?

  16. Avatar
    bekalu October 28, 2021 at 11:34 pm #

    what is the difference between LVQ and VQ which one of them is unsupervised and which one is supervised ?

    • Adrian Tam
      Adrian Tam October 29, 2021 at 3:00 am #

      I think it is the best to consider VQ as the result of KNN while LVQ is how you get the VQ.

  17. Avatar
    MiGoreng March 19, 2022 at 3:57 pm #

    Hello.

    What would be the best way to remove weight vectors that no longer win battles?

  18. Avatar
    Aaron August 14, 2022 at 2:44 am #

    Is there a general rule for the starting point of selecting the number of codebooks? E.g. Based on the dimensionality of the data?

    And is there a generalization for the trade off between tuning the number of codebooks and the learning rate?

  19. Avatar
    Aaron August 14, 2022 at 2:46 am #

    Specifically for binary classification^^

Leave a Reply