Gentle Introduction to Vector Norms in Machine Learning

Calculating the length or magnitude of vectors is often required either directly as a regularization method in machine learning, or as part of broader vector or matrix operations.

In this tutorial, you will discover the different ways to calculate vector lengths or magnitudes, called the vector norm.

After completing this tutorial, you will know:

  • The L1 norm that is calculated as the sum of the absolute values of the vector.
  • The L2 norm that is calculated as the square root of the sum of the squared vector values.
  • The max norm that is calculated as the maximum vector values.

Let’s get started.

  • Update Mar/2018: Fixed typo in max norm equation.
  • Update Sept/2018: Fixed typo related to the size of the vectors defined.
Gentle Introduction to Vector Norms in Machine Learning

Gentle Introduction to Vector Norms in Machine Learning
Photo by Cosimo, some rights reserved.

Tutorial Overview

This tutorial is divided into 4 parts; they are:

  1. Vector Norm
  2. Vector L1 Norm
  3. Vector L2 Norm
  4. Vector Max Norm

Need help with Linear Algebra for Machine Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Vector Norm

Calculating the size or length of a vector is often required either directly or as part of a broader vector or vector-matrix operation.

The length of the vector is referred to as the vector norm or the vector’s magnitude.

The length of a vector is a nonnegative number that describes the extent of the vector in space, and is sometimes referred to as the vector’s magnitude or the norm.

— Page 112, No Bullshit Guide To Linear Algebra, 2017

The length of the vector is always a positive number, except for a vector of all zero values. It is calculated using some measure that summarizes the distance of the vector from the origin of the vector space. For example, the origin of a vector space for a vector with 3 elements is (0, 0, 0).

Notations are used to represent the vector norm in broader calculations and the type of vector norm calculation almost always has its own unique notation.

We will take a look at a few common vector norm calculations used in machine learning.

Vector L1 Norm

The length of a vector can be calculated using the L1 norm, where the 1 is a superscript of the L, e.g. L^1.

The notation for the L1 norm of a vector is ||v||1, where 1 is a subscript. As such, this length is sometimes called the taxicab norm or the Manhattan norm.

The L1 norm is calculated as the sum of the absolute vector values, where the absolute value of a scalar uses the notation |a1|. In effect, the norm is a calculation of the Manhattan distance from the origin of the vector space.

The L1 norm of a vector can be calculated in NumPy using the norm() function with a parameter to specify the norm order, in this case 1.

First, a 1×3 vector is defined, then the L1 norm of the vector is calculated.

Running the example first prints the defined vector and then the vector’s L1 norm.

The L1 norm is often used when fitting machine learning algorithms as a regularization method, e.g. a method to keep the coefficients of the model small, and in turn, the model less complex.

Vector L2 Norm

The length of a vector can be calculated using the L2 norm, where the 2 is a superscript of the L, e.g. L^2.

The notation for the L2 norm of a vector is ||v||2 where 2 is a subscript.

The L2 norm calculates the distance of the vector coordinate from the origin of the vector space. As such, it is also known as the Euclidean norm as it is calculated as the Euclidean distance from the origin. The result is a positive distance value.

The L2 norm is calculated as the square root of the sum of the squared vector values.

The L2 norm of a vector can be calculated in NumPy using the norm() function with default parameters.

First, a 1×3 vector is defined, then the L2 norm of the vector is calculated.

Running the example first prints the defined vector and then the vector’s L2 norm.

Like the L1 norm, the L2 norm is often used when fitting machine learning algorithms as a regularization method, e.g. a method to keep the coefficients of the model small and, in turn, the model less complex.

By far, the L2 norm is more commonly used than other vector norms in machine learning.

Vector Max Norm

The length of a vector can be calculated using the maximum norm, also called max norm.

Max norm of a vector is referred to as L^inf where inf is a superscript and can be represented with the infinity symbol. The notation for max norm is ||x||inf, where inf is a subscript.

The max norm is calculated as returning the maximum value of the vector, hence the name.

The max norm of a vector can be calculated in NumPy using the norm() function with the order parameter set to inf.

First, a 1×3 vector is defined, then the max norm of the vector is calculated.

Running the example first prints the defined vector and then the vector’s max norm.

Max norm is also used as a regularization in machine learning, such as on neural network weights, called max norm regularization.

Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

  • Create 5 examples using each operation using your own data.
  • Implement each matrix operation manually for matrices defined as lists of lists.
  • Search machine learning papers and find 1 example of each operation being used.

If you explore any of these extensions, I’d love to know.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Books

API

Articles

Summary

In this tutorial, you discovered the different ways to calculate vector lengths or magnitudes, called the vector norm.

Specifically, you learned:

  • The L1 norm that is calculated as the sum of the absolute values of the vector.
  • The L2 norm that is calculated as the square root of the sum of the squared vector values.
  • The max norm that is calculated as the maximum vector values.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.


Get a Handle on Linear Algebra for Machine Learning!

Linear Algebra for Machine Learning

Develop a working understand of linear algebra

…by writing lines of code in python

Discover how in my new Ebook:
Linear Algebra for Machine Learning

It provides self-study tutorials on topics like:
Vector Norms, Matrix Multiplication, Tensors, Eigendecomposition, SVD, PCA and much more…

Finally Understand the Mathematics of Data

Skip the Academics. Just Results.

Click to learn more.


15 Responses to Gentle Introduction to Vector Norms in Machine Learning

  1. Hari February 13, 2018 at 11:27 pm #

    Hi Jason,

    I have a question, why are they L1 and L2. Are there any more norms like L3,L4 etc..?

    If so why are we only using L1/L2 norm in machine learning?

    Is this any way related to why we use squares of errors instead of taking absolute value of errors to minimize while optimizing?

    • Jason Brownlee February 14, 2018 at 8:22 am #

      I don’t know about the reasons for the names off the top of my head, sorry.

      Yes, there are nice mathematical properties for mse.

  2. Russell Bigley February 16, 2018 at 3:49 am #

    just a couple of suggestions for clarity.

    While writing about the L1 norm, this line doesn’t seem necessary
    “The L2 norm of a vector can be calculated in NumPy using the norm() function with a parameter to specify the norm order, in this case 1.”

    Also, even though, not something I would do while programming in the real world, the ‘l” in l1, l2, might be better represented with capital letters L1, L2 for the python programming examples.

  3. Russell Bigley February 16, 2018 at 8:56 am #

    The calculation for max norm isn’t explained.

    Is it taking the the vector points [1, 0 ,0 ], [0, 2, 0], and [0, 0, 3] and finding the largest vector of the sparse vectors?

  4. Jeza May 10, 2018 at 9:01 pm #

    Thanks for your explanation,
    My question is how to calculate quasi-norm such as L(0.5)

  5. udaya July 17, 2018 at 7:20 pm #

    Different ways of finding vector norm – length of the vector – magnitude of the vector are L1,L2 and L inf. Don’t the vector norm of the same vector be same ?

    • Jason Brownlee July 18, 2018 at 6:32 am #

      No, there are many ways of calculating the length.

      • udaya July 19, 2018 at 7:34 pm #

        So how can we find the components of a vector from its magnitude and direction? Normally we use euclidean function in that case. I am confused.

  6. udaya July 24, 2018 at 10:36 pm #

    I got cleared my confusion. Thank you

  7. Saurabh Sharma August 10, 2018 at 12:37 am #

    Just wondering! why do we need to convert vectors to unit norm in ML? what is the reason behind this? Also, I was looking at an example of preprocessing in stock movement data-set and the author used preprocessing.normalizer(norm=’l2′). Any particular reason behind this? Does it have anything to do with the sparsity of the data? Sorry for too many questions.

    • Jason Brownlee August 10, 2018 at 6:19 am #

      We do this to keep the values in the vector small when learning (optimizing) a machine learning model, which in turn reduces the complexity of the model and results in a better model (better generalization).

  8. tim September 8, 2018 at 1:55 am #

    The text says ‘a 3×3 vector is defined’ but your code is defining a 1×3 vector: [1,2,3]. Can you correct your text?

Leave a Reply