Learning Vector Quantization for Machine Learning

A downside of K-Nearest Neighbors is that you need to hang on to your entire training dataset.

The Learning Vector Quantization algorithm (or LVQ for short) is an artificial neural network algorithm that lets you choose how many training instances to hang onto and learns exactly what those instances should look like.

In this post you will discover the Learning Vector Quantization algorithm. After reading this post you will know:

  • The representation used by the LVQ algorithm that you actually save to a file.
  • The procedure that you can use to make predictions with a learned LVQ model.
  • How to learn an LVQ model from training data.
  • The data preparation to use to get the best performance from the LVQ algorithm.
  • Where to look for more information on LVQ.

This post was written for developers and assumes no background in statistics or mathematics. The post focuses on how the algorithm works and how to use it for predictive modeling problems.

If you have any questions on LVQ, leave a comment and I will do my best to answer.

Let’s get started.

Learning Vector Quantization for Machine Learning

Learning Vector Quantization for Machine Learning
Photo by Holly Victoria Norval, some rights reserved.

LVQ Model Representation

The representation for LVQ is a collection of codebook vectors.

LVQ was developed and is best understood as a classification algorithm. It supports both binary (two-class) and multi-class classification problems.

A codebook vector is a list of numbers that have the same input and output attributes as your training data. For example, if your problem is a binary classification with classes 0 and 1, and the inputs width, length height, then a codebook vector would be comprised of all four attributes: width, length, height and class.

The model representation is a fixed pool of codebook vectors, learned from the training data. They look like training instances, but the values of each attribute have been adapted based on the learning procedure.

In the language of neural networks, each codebook vector may be called a neuron, each attribute on a codebook vector is called a weight and the collection of codebook vectors is called a network.

Get your FREE Algorithms Mind Map

Machine Learning Algorithms Mind Map

Sample of the handy machine learning algorithms mind map.

I've created a handy mind map of 60+ algorithms organized by type.

Download it, print it and use it. 

Download For Free


Also get exclusive access to the machine learning algorithms email mini-course.

 

 

Making Predictions with an LVQ Model

Predictions are made using the LVQ codebook vectors in the same way as K-Nearest Neighbors.

Predictions are made for a new instance (x) by searching through all codebook vectors for the K most similar instances and summarizing the output variable for those K instances. For classification this is the mode (or most common) class value.

Typically predictions are made with K=1, and the codebook vector that matches is called the Best Matching Unit (BMU).

To determine which of the K instances in the training dataset are most similar to a new input a distance measure is used. For real-valued input variables, the most popular distance measure is Euclidean distance. Euclidean distance is calculated as the square root of the sum of the squared differences between a new point (x) and an existing point (xi) for each attribute j.

EuclideanDistance(x, xi) = sqrt( sum( (xj – xij)^2 ) )

Learning an LVQ Model From Data

The LVQ algorithm learns the codebook vectors from the training data.

You must choose the number of codebook vectors to use, such as 20 or 40. You can find the best number of codebook vectors to use by testing different configurations on your training dataset.

The learning algorithm starts with a pool of random codebook vectors. These could be randomly selected instances from the training data, or randomly generated vectors with the same scale as the training data. Codebook vectors have the same number of input attributes as the training data. They also have an output class variable.

The instances in the training dataset are processed one at a time. For a given training instance, the most similar codebook vector is selected from the pool.

If the codebook vector has the same output as the training instance, the codebook vector is moved closer to the training instance. If it does not match, it is moved further away. The amount that the vector is moved is controlled by an algorithm parameter called the learning_rate.

For example, the input variable (x) of a codebook vector is moved closer to the training input value (t) by the amount in the learning_rate if the classes match as follows:

x = x + learning_rate * (t – x)

The opposite case of moving the input variables of a codebook variable away from a training instance is calculated as:

x = x – learning_rate * (t – x)

This would be repeated for each input variable.

Because one codebook vector is selected for modification for each training instance the algorithm is referred to as a winner-take-all algorithm or a type of competitive learning.

This process is repeated for each instance in the training dataset. One iteration of the training dataset is called an epoch. The process is completed for a number of epochs that you must choose (max_epoch), such as 200.

You must also choose an initial learning rate (such as alpha=0.3). The learning rate is decreased with the epoch, starting at the large value you specify at epoch 1 which makes the most change to the codebook vectors and finishing with a small value near zero on the last epoch, making very minor changes to the codebook vectors.

The learning rate for each epoch is calculated as:

learning_rate = alpha * (1 – (epoch/max_epoch))

Where learning_rate is the learning rate for the current epoch (0 to max_epoch-1), alpha is the learning rate specified to the algorithm at the start of the training run and max_epoch is the total number of epochs to run the algorithm also specified at the start of the run.

The intuition for the learning process is that the pool of codebook vectors is a compression of the training dataset to the points that best characterize the separation of the classes.

Data Preparation for LVQ

Generally, it is a good idea to prepare data for LVQ in the same way as you would prepare it for K-Nearest Neighbors.

  • Classification: LVQ is a classification algorithm that works for both binary (two-class) and multi-class classification algorithms. The technique has been adapted for regression.
  • Multiple-Passes: Good technique with LVQ involves performing multiple passes of the training dataset over the codebook vectors (e.g. multiple learning runs). The first with a higher learning rate to settle the pool codebook vectors and the second run with a small learning rate to fine tune the vectors.
  • Multiple Best Matches: Extensions of LVQ select multiple best matching units to modify during learning, such as one of the same class and one of a different class which are drawn toward and away from a training sample respectively. Other extensions use a custom learning rate for each codebook vector. These extensions can improve the learning process.
  • Normalize Inputs: Traditionally, inputs are normalized (rescaled) to values between 0 and 1. This is to avoid one attribute from dominating the distance measure. If the input data is normalized, then the initial values for the codebook vectors can be selected as random values between 0 and 1.
  • Feature Selection: Feature selection that can reduce the dimensionality of the input variables can improve the accuracy of the method. LVQ suffers from the same curse of dimensionality in making predictions as K-Nearest Neighbors.

Further Reading

The technique was developed by Kohonen who wrote the seminal book on LVQ and the sister method Self-Organizing Maps called: Self-Organizing Maps.

Amazon Image

I highly recommend this book if you are interested in LVQ.

Summary

In this post you discovered the LVQ algorithm. You learned:

  • The representation for LVQ is a small pool of codebook vectors, smaller than the training dataset.
  • The codebook vectors are used to make predictions using the same technique as K-Nearest Neighbors.
  • The codebook vectors are learned from the training dataset by moving them closer when they are good match and further away when they are a bad match.
  • The codebook vectors are a compression of the training data to best separate the classes.
  • Data preparation traditionally involves normalizing the input values to the range between 0 and 1.

Do you have any questions about this post or the LVQ algorithm? Leave a comment and ask your question and I will do my best to answer it.


Frustrated With Machine Learning Math?

Mater Machine Learning Algorithms

See How Algorithms Work in Minutes

…with just arithmetic and simple examples

Discover how in my new Ebook: Master Machine Learning Algorithms

It covers explanations and examples of 10 top algorithms, like:
Linear Regression, k-Nearest Neighbors, Support Vector Machines and much more…

Finally, Pull Back the Curtain on
Machine Learning Algorithms

Skip the Academics. Just Results.

Click to learn more.


30 Responses to Learning Vector Quantization for Machine Learning

  1. aloo masala October 18, 2016 at 5:29 pm #

    Hi,

    nice and readable post. If I am correct, then the update rule should use ‘learning_rate’ instead of the initial value ‘alpha’. In the current form ‘learning_rate’ is decreased but never applied. Or replace ‘learning_rate’ by ‘alpha’ and introduce something like ‘alpha0’ in the update formula of the learning rate.

    Please delete or do not publish this post.

    Best

    aloo

    • Jason Brownlee October 19, 2016 at 9:17 am #

      Alpha is a fixed parameter and learning rate is what is used to update the weights.

      • aloo masala October 19, 2016 at 7:01 pm #

        The update rule of codebook vector x uses the fixed parameter alpha rather than the adaptive learning_rate parameter.

        • Jason Brownlee November 7, 2016 at 7:28 am #

          Thanks aloo masala, I have updated the example.

  2. aloo masala November 19, 2016 at 7:52 am #

    You are welcome. Now you can delete my comments.

  3. Tom Anderson November 29, 2016 at 6:30 pm #

    Kohonen has recently self-published a book on this topic. “MATLAB Implementations and Applications of the Self-Organizing Map”

    Although its examples are using MATLAB, I think this may be a good option for self-study, since it’s free!
    http://docs.unigrafia.fi/publications/kohonen_teuvo/

  4. Declan Lange January 14, 2017 at 6:27 pm #

    I’ve got a network trained with weights being initialised (this is in Java), what’s the best way to feed in a new data set/vector and find the best matching dataset to it?

    Thanks,
    Declan

    • Jason Brownlee January 15, 2017 at 5:29 am #

      Sorry, I don’t understand your question Declan. Perhaps you could restate it?

      • Declan Lange January 15, 2017 at 2:27 pm #

        I have a network with training sets that have a vector and a name (classification name) in it, i’m trying to get a new vector, that has no name, and determine what it’s closest to.

        I’m trying to make a prediction using a new data set to predict what it’s most similar to.

  5. xenomorph February 14, 2017 at 11:28 am #

    Do people really use LVQ and locally weighted learning in practice, in 2017?

    • Jason Brownlee February 15, 2017 at 11:31 am #

      I like to throw LVQ in the mix when spot checking algorithms on a problem and go with it if it products good results or results similar to a kNN.

  6. Ben Cryer May 18, 2018 at 9:07 pm #

    Best Machine Learning Classification Algorithms using MATLAB tutorial

    https://www.simpliv.com/developmenttool/machine-learning-classification-algorithms-using-matlab

  7. Geordie June 10, 2018 at 11:37 am #

    Hi there! Let’s assume that we are encoding to a collection of K codebook vectors (for example K=512). We train the system on our data and then want to use it.

    Is it correct that the total number of possible outputs of the network is K? IE if I train the network on 100,000 128x128x3 images with K=512, after training ANY 128x128x3 image will get mapped to one of 512 outputs?

    • Jason Brownlee June 11, 2018 at 6:05 am #

      Not sure I follow. The output of the model is a summary of the BMUs.

  8. Valentin June 13, 2018 at 4:50 am #

    what happens if the number of prototypes(codebooks) per class is equal to the number of data points we have ?

    • Jason Brownlee June 13, 2018 at 6:21 am #

      In that case only use LVQ if it outperforms kNN. It may.

  9. MAK June 26, 2018 at 6:40 am #

    Hii,
    Thanks for your incredible site.
    Like you say in your post KNN ” need to hang on to your entire training dataset”
    I want to use LVQ , but can I use LVQ as unsupervised algorithm (or one class classification) ?
    I have dataset with normal behavior , and I want to predict when new item is abnormal .
    Thanks,

    MAK

    • Jason Brownlee June 26, 2018 at 6:44 am #

      Perhaps try it.

      • MAK June 26, 2018 at 7:11 am #

        Hii,
        Can you give me a hint….
        If I have codebook vectors from the same class , how LVQ will work ?
        According to your explain LVQ can be only binary/multi-class .
        10x

        • Jason Brownlee June 26, 2018 at 2:26 pm #

          I do not have an example, sorry. Perhaps you can use a threshold of a distance measure for an example from the codebook vectors, as you would in a KNN model for one-class prediction.

  10. Giant July 24, 2018 at 11:14 pm #

    Thank you so much.
    I have a question:
    After the updating the codebook (i.e., x = x + learning_rate * (t – x)), some elements in the codebook
    may not belong to the training data.
    How to define the classes for those elements?

    • Jason Brownlee July 25, 2018 at 6:18 am #

      Classes are assigned to new instances by finding the BMU and using it’s class.

      • Giant July 26, 2018 at 2:53 am #

        Thank you so much for the kindly response.

        For each element in the updated codebook, if its class is determined by
        the class of its BMU, it is possible that the class of the element changes
        along the iteration, right?
        If it is right, the final codebook may be class-inbalanced.

  11. J July 26, 2018 at 11:26 am #

    Hi Jason, thanks a lot for your great, readable summary! I have three quick follow-up questions and would be glad about your opinion on those. Also happy about any pointers to other references if this helps answer the question.

    1. Can LVQ output a probability instead of a class? E.g. if I have two classes (coded as 0 vs. 1), could LVQ output probabilities like 0.30 or 0.60 instead of just 0s and 1s? I am asking because I tried using an LVQ implementation in R but it seems like it can just provide the class itself as output with the lvqtest function (https://www.rdocumentation.org/packages/class/versions/7.3-14/topics/lvq3). So, I am wondering if this is just because of the implementation in this R package or if LVQ cannot provide probabilities in general.

    2. Would you say that LVQ is a suitable algorithm for data sets with >= 100 predictor variables and >= 500.000 observations? I have the feeling that this is not the case as I got stuck with LVQ in R (using the before mentioned implementation) with a data set of this size. Also, it seems like LVQ somehow still has similarities to KNN which is definitely not suitable for data sets of this size.

    3. You said that the input values should be normalised to a range between 0 and 1. Is it enough to just normalise the values to get a mean of 0 and a sd of 1? (Basically, I am asking if it is enough to stop after step 3 of the description by Stephen Joy in the first answer of this forum post: https://www.researchgate.net/post/How_do_i_normalize_data_from_0_to_1_range). If the LVQ implementation that I am using does not specify that values in the range between 0 and 1 are required, this should be okay right?

    Thanks a lot in advance

    • Jason Brownlee July 26, 2018 at 2:24 pm #

      Not directly or natively, you may be able to use methods from SVM to convert the output into a probability.

      LVQ like kNN will suffer from the curse of dimensionality. Lots of features may be problematic.

      Standardizing is fine.

Leave a Reply