Calculating the length or magnitude of vectors is often required either directly as a regularization method in machine learning, or as part of broader vector or matrix operations.

In this tutorial, you will discover the different ways to calculate vector lengths or magnitudes, called the vector norm.

After completing this tutorial, you will know:

- The L1 norm that is calculated as the sum of the absolute values of the vector.
- The L2 norm that is calculated as the square root of the sum of the squared vector values.
- The max norm that is calculated as the maximum vector values.

**Kick-start your project** with my new book Linear Algebra for Machine Learning, including *step-by-step tutorials* and the *Python source code* files for all examples.

Let’s get started.

**Update Mar/2018**: Fixed typo in max norm equation.**Update Sept/2018**: Fixed typo related to the size of the vectors defined.

## Tutorial Overview

This tutorial is divided into 4 parts; they are:

- Vector Norm
- Vector L1 Norm
- Vector L2 Norm
- Vector Max Norm

### Need help with Linear Algebra for Machine Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

## Vector Norm

Calculating the size or length of a vector is often required either directly or as part of a broader vector or vector-matrix operation.

The length of the vector is referred to as the vector norm or the vector’s magnitude.

The length of a vector is a nonnegative number that describes the extent of the vector in space, and is sometimes referred to as the vector’s magnitude or the norm.

— Page 112, No Bullshit Guide To Linear Algebra, 2017

The length of the vector is always a positive number, except for a vector of all zero values. It is calculated using some measure that summarizes the distance of the vector from the origin of the vector space. For example, the origin of a vector space for a vector with 3 elements is (0, 0, 0).

Notations are used to represent the vector norm in broader calculations and the type of vector norm calculation almost always has its own unique notation.

We will take a look at a few common vector norm calculations used in machine learning.

## Vector L1 Norm

The length of a vector can be calculated using the L1 norm, where the 1 is a superscript of the L, e.g. L^1.

The notation for the L1 norm of a vector is ||v||_{1}, where 1 is a subscript. As such, this length is sometimes called the taxicab norm or the Manhattan norm.

1 |
l1(v) = ||v||1 |

The L1 norm is calculated as the sum of the absolute vector values, where the absolute value of a scalar uses the notation |a1|. In effect, the norm is a calculation of the Manhattan distance from the origin of the vector space.

1 |
||v||1 = |a1| + |a2| + |a3| |

The L1 norm of a vector can be calculated in NumPy using the norm() function with a parameter to specify the norm order, in this case 1.

1 2 3 4 5 6 7 |
# l1 norm of a vector from numpy import array from numpy.linalg import norm a = array([1, 2, 3]) print(a) l1 = norm(a, 1) print(l1) |

First, a 1×3 vector is defined, then the L1 norm of the vector is calculated.

Running the example first prints the defined vector and then the vector’s L1 norm.

1 2 3 |
[1 2 3] 6.0 |

The L1 norm is often used when fitting machine learning algorithms as a regularization method, e.g. a method to keep the coefficients of the model small, and in turn, the model less complex.

## Vector L2 Norm

The length of a vector can be calculated using the L2 norm, where the 2 is a superscript of the L, e.g. L^2.

The notation for the L2 norm of a vector is ||v||_{2} where 2 is a subscript.

1 |
l2(v) = ||v||2 |

The L2 norm calculates the distance of the vector coordinate from the origin of the vector space. As such, it is also known as the Euclidean norm as it is calculated as the Euclidean distance from the origin. The result is a positive distance value.

The L2 norm is calculated as the square root of the sum of the squared vector values.

1 |
||v||2 = sqrt(a1^2 + a2^2 + a3^2) |

The L2 norm of a vector can be calculated in NumPy using the norm() function with default parameters.

1 2 3 4 5 6 7 |
# l2 norm of a vector from numpy import array from numpy.linalg import norm a = array([1, 2, 3]) print(a) l2 = norm(a) print(l2) |

First, a 1×3 vector is defined, then the L2 norm of the vector is calculated.

Running the example first prints the defined vector and then the vector’s L2 norm.

1 2 3 |
[1 2 3] 3.74165738677 |

Like the L1 norm, the L2 norm is often used when fitting machine learning algorithms as a regularization method, e.g. a method to keep the coefficients of the model small and, in turn, the model less complex.

By far, the L2 norm is more commonly used than other vector norms in machine learning.

## Vector Max Norm

The length of a vector can be calculated using the maximum norm, also called max norm.

Max norm of a vector is referred to as L^inf where inf is a superscript and can be represented with the infinity symbol. The notation for max norm is ||x||inf, where inf is a subscript.

1 |
maxnorm(v) = ||v||inf |

The max norm is calculated as returning the maximum value of the vector, hence the name.

1 |
||v||inf = max(|a1|, |a2|, |a3|) |

The max norm of a vector can be calculated in NumPy using the norm() function with the order parameter set to inf.

1 2 3 4 5 6 7 8 |
# max norm of a vector from numpy import inf from numpy import array from numpy.linalg import norm a = array([1, 2, 3]) print(a) maxnorm = norm(a, inf) print(maxnorm) |

First, a 1×3 vector is defined, then the max norm of the vector is calculated.

Running the example first prints the defined vector and then the vector’s max norm.

1 2 3 |
[1 2 3] 3.0 |

Max norm is also used as a regularization in machine learning, such as on neural network weights, called max norm regularization.

## Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

- Create 5 examples using each operation using your own data.
- Implement each matrix operation manually for matrices defined as lists of lists.
- Search machine learning papers and find 1 example of each operation being used.

If you explore any of these extensions, I’d love to know.

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

### Books

- Introduction to Linear Algebra, 2016.
- Chapter 2, Linear Algebra, Deep Learning, 2016.

### API

### Articles

## Summary

In this tutorial, you discovered the different ways to calculate vector lengths or magnitudes, called the vector norm.

Specifically, you learned:

- The L1 norm that is calculated as the sum of the absolute values of the vector.
- The L2 norm that is calculated as the square root of the sum of the squared vector values.
- The max norm that is calculated as the maximum vector values.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

Hi Jason,

I have a question, why are they L1 and L2. Are there any more norms like L3,L4 etc..?

If so why are we only using L1/L2 norm in machine learning?

Is this any way related to why we use squares of errors instead of taking absolute value of errors to minimize while optimizing?

I don’t know about the reasons for the names off the top of my head, sorry.

Yes, there are nice mathematical properties for mse.

Hi Hari,

The 0,1 and 2 norms are just the most used cases, but there is an infinite number.

Formally, the l_p norm is defined as \left \| x \right \|_p = \sqrt[p]{\sum_{i}\left | x_i \right |^p} where p \epsilon \mathbb{R}

L2 norm is named because you compute the sum of squares of the elements in your vector/matrix/tensor. L3 is the sum of cubes of individual elements, and so on and so forth. L1 is the sum of the absolute-value of the individual elements. They all are manifestations of L_p norm (which is computed from summing the individual elements each raised to the p-th power), as Daniel mentioned.

I think this can be more detailed like providing actual formula.

like how

L1 is actually summation {x1^p + x2^p + x3^p … xn^p } ^ 1/p when p=1.

just a couple of suggestions for clarity.

While writing about the L1 norm, this line doesn’t seem necessary

“The L2 norm of a vector can be calculated in NumPy using the norm() function with a parameter to specify the norm order, in this case 1.”

Also, even though, not something I would do while programming in the real world, the ‘l” in l1, l2, might be better represented with capital letters L1, L2 for the python programming examples.

Thanks Russell!

The calculation for max norm isn’t explained.

Is it taking the the vector points [1, 0 ,0 ], [0, 2, 0], and [0, 0, 3] and finding the largest vector of the sparse vectors?

Thanks for your explanation,

My question is how to calculate quasi-norm such as L(0.5)

Different ways of finding vector norm – length of the vector – magnitude of the vector are L1,L2 and L inf. Don’t the vector norm of the same vector be same ?

No, there are many ways of calculating the length.

So how can we find the components of a vector from its magnitude and direction? Normally we use euclidean function in that case. I am confused.

I got cleared my confusion. Thank you

Glad to hear that.

Just wondering! why do we need to convert vectors to unit norm in ML? what is the reason behind this? Also, I was looking at an example of preprocessing in stock movement data-set and the author used preprocessing.normalizer(norm=’l2′). Any particular reason behind this? Does it have anything to do with the sparsity of the data? Sorry for too many questions.

We do this to keep the values in the vector small when learning (optimizing) a machine learning model, which in turn reduces the complexity of the model and results in a better model (better generalization).

The text says ‘a 3Ã—3 vector is defined’ but your code is defining a 1×3 vector: [1,2,3]. Can you correct your text?

Thanks, fixed!

Awesome article. Love this site.

Thanks Chris!

How can I calculate the L1 and L2 norms for 3D matrixes?

e.g:

input_shape = (10, 20, 3)

a = np.ones(input_shape) * 2

b = np.ones(input_shape) * 4

x = a – b

l1_norm_of_x = ????

l2_norm_of_x = ????

The norm of a matrix is the Frobenius norm:

https://en.wikipedia.org/wiki/Matrix_norm#Frobenius_norm

Is there any thumb rule to decide which distance metric to use for a problem ?

Yes, I have seen some. Mostly it comes down to your preferred outcome – e.g. what you want to capture/handle/promote in the measure.

I read that L1 norm is better than L2 at capturing small changes in model’s coefficients , L2 is increase very slowly near the origin and I didn’t understand why?

Perhaps ask the person that made this statement to you to see exactly what they meant?

Because for any positve x <1 you will see x^2 (L2) < 1, x^2 > x

i’ve clearly understood the Norms but wanna know the behind scenes use of it in machine learning and neural networks. Can you please explain how it is used in normalization(in depth)

Thank you in advance.

Sure, this post shows how:

https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-learning-with-weight-regularization/

Hi Jason,

I was wondering is the L2 like the hypothenuse?

And are you using matlab for the operation windows you are posting in this page?

The code examples are all written in python.

my solution to the exercise above. Great article as always

Thanks for sharing!

Hello I have a sparse matrix with me of size 4*9 after applying Fit and Transform function ( I am newbie in ML), now I need to implement L2 norm on above matrix but when I try to use your method it doesn’t work as desired, the output is (for top row without L2 norm)

(0, 3) 1

(0, 6) 1

(0, 8) 1

(0, 2) 1

but it should be like (0, 8) 0.38408524091481483

(0, 6) 0.38408524091481483

(0, 3) 0.38408524091481483

(0, 2) 0.5802858236844359

What wrong am I doing here? and how should I solve this problem for my matrix?

Below is the dense matrix for reference:

[[0 1 1 1 0 0 1 0 1]

[0 2 0 1 0 1 1 0 1]

[1 0 0 1 1 0 1 1 1]

[0 1 1 1 0 0 1 0 1]]

Perhaps convert it to a dense matrix first:

https://machinelearningmastery.com/sparse-matrices-for-machine-learning/

Hello sir,

I would like know weather can someone use the vector max norm { in deep hashing loss function}? As some researcher have used L2 norm in their loss function .Thanks

I don’t know, sorry.

||W|| = 1.

what does it mean????

Hi Efran,

This means that the “norm” or magnitude of the vector is length 1. More examples and explanations can be found here:

http://mathonline.wikidot.com/the-norm-of-a-vector

Regards,

Do the vectors need to be unit vector to use L1/L2 norm?

If yes then why is it so?

Hi Kartik…No. These are used to determine the “length” or magnitude of vectors. Once determined, they can be used to create a unit vector:

https://www.cuemath.com/calculus/unit-vector/

Hi Jason, love your blog! Ive begun playing around with ML in C++. Regarding L1 and L2 normalisation, are these values just scaled (alpha and beta) and applied in the gradient descent phase of the algorithm? Ive tried code as below but it only converges on a solution when alpha and beta equal 0.0. If I do have the norms in the right place, what size do alpha and beta typically take? Cheers, Ben.

W[i][j] -= learning_rate * dW[i][j] – alpha*L1_norm – beta*L2_norm;

Hi Ben…You may find the following helpful:

https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c

This website is pure gold when you’re trying to learn about neural networks, thank you guys for really helping me out!

Great feedback Mathias!

Hi Jason. I understood what is L1 norm and L2 norm using this article nicely. I want to know what is L2,1 – norm ?

Hi Vaishali…The following resource may add clarity:

https://ai.stackexchange.com/questions/17304/what-is-the-ell-2-1-norm

Although this is a question that is unrelated to this article, I would appreciate it if you could answer it. when to use (;) in describing a specific probability in the context of mixture models? thank you.

Hi spike…The following resource is an outstanding reference:

https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=3475&context=dissertations_2