What Is Argmax in Machine Learning?

By Jason Brownlee on August 19, 2020 in Linear Algebra 25

Argmax is a mathematical function that you may encounter in applied machine learning.

For example, you may see “argmax” or “arg max” used in a research paper used to describe an algorithm. You may also be instructed to use the argmax function in your algorithm implementation.

This may be the first time that you encounter the argmax function and you may wonder what it is and how it works.

In this tutorial, you will discover the argmax function and how it is used in machine learning.

After completing this tutorial, you will know:

Argmax is an operation that finds the argument that gives the maximum value from a target function.
Argmax is most commonly used in machine learning for finding the class with the largest predicted probability.
Argmax can be implemented manually, although the argmax() NumPy function is preferred in practice.

Kick-start your project with my new book Linear Algebra for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

What Is argmax in Machine Learning?
Photo by Bernard Spragg. NZ, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

What Is Argmax?
How Is Argmax Used in Machine Learning?
How to Implement Argmax in Python

What Is Argmax?

Argmax is a mathematical function.

It is typically applied to another function that takes an argument. For example, given a function g() that takes the argument x, the argmax operation of that function would be described as follows:

result = argmax(g(x))

The argmax function returns the argument or arguments (arg) for the target function that returns the maximum (max) value from the target function.

Consider the example where g(x) is calculated as the square of the x value and the domain or extent of input values (x) is limited to integers from 1 to 5:

g(1) = 1^2 = 1
g(2) = 2^2 = 4
g(3) = 3^2 = 9
g(4) = 4^2 = 16
g(5) = 5^2 = 25

We can intuitively see that the argmax for the function g(x) is 5.

That is, the argument (x) to the target function g() that results in the largest value from the target function (25) is 5. Argmax provides a shorthand for specifying this argument in an abstract way without knowing what the value might be in a specific case.

argmax(g(x)) = 5

Note that this is not the max() of the values returned from function. This would be 25.

It is also not the max() of the arguments, although in this case the argmax and max of the arguments is the same, e.g. 5. The argmax() is 5 because g returns the largest value (25) when 5 is provided, not because 5 is the largest argument.

Typically, “argmax” is written as two separate words, e.g. “arg max“. For example:

result = arg max(g(x))

It is also common to use the arg max function as an operation without brackets surrounding the target function. This is often how you will see the operation written and used in a research paper or textbook. For example:

result = arg max g(x)

You can also use a similar operation to find the arguments to the target function that result in the minimum value from the target function, called argmin or “arg min.”

How Is Argmax Used in Machine Learning?

The argmax function is used throughout the field of mathematics and machine learning.

Nevertheless, there are specific situations where you will see argmax used in applied machine learning and may need to implement it yourself.

The most common situation for using argmax that you will encounter in applied machine learning is in finding the index of an array that results in the largest value.

Recall that an array is a list or vector of numbers.

It is common for multi-class classification models to predict a vector of probabilities (or probability-like values), with one probability for each class label. The probabilities represent the likelihood that a sample belongs to each of the class labels.

The predicted probabilities are ordered such that the predicted probability at index 0 belongs to the first class, the predicted probability at index 1 belongs to the second class, and so on.

Often, a single class label prediction is required from a set of predicted probabilities for a multi-class classification problem.

This conversion from a vector of predicted probabilities to a class label is most often described using the argmax operation and most often implemented using the argmax function.

Let’s make this concrete with an example.

Consider a multi-class classification problem with three classes: “red“, “blue,” and “green.” The class labels are mapped to integer values for modeling, as follows:

red = 0
blue = 1
green = 2

Each class label integer values maps to an index of a 3-element vector that may be predicted by a model specifying the likelihood that an example belongs to each class.

Consider a model has made one prediction for an input sample and predicted the following vector of probabilities:

yhat = [0.4, 0.5, 0.1]

We can see that the example has a 40 percent probability of belonging to red, a 50 percent probability of belonging to blue, and a 10 percent probability of belonging to green.

We can apply the argmax function to the vector of probabilities. The vector is the function, the output of the function is the probabilities, and the input to the function is a vector element index or an array index.

arg max yhat

We can intuitively see that in this case, the argmax of the vector of predicted probabilities (yhat) is 1, as the probability at array index 1 is the largest value.

Note that this is not the max() of the probabilities, which would be 0.5. Also note that this is not the max of the arguments, which would be 2. Instead it is the argument that results in the maximum value, e.g. 1 that results in 0.5.

arg max yhat = 1

We can then map this integer value back to a class label, which would be “blue.”

arg max yhat = “blue”

How to Implement Argmax in Python

The argmax function can be implemented in Python for a given vector of numbers.

Argmax from Scratch

First, we can define a function called argmax() that enumerates a provided vector and returns the index with the largest value.

The complete example is listed below.

# argmax function
def argmax(vector):
	index, value = 0, vector[0]
	for i,v in enumerate(vector):
		if v > value:
			index, value = i,v
	return index

# define vector
vector = [0.4, 0.5, 0.1]
# get argmax
result = argmax(vector)
print('arg max of %s: %d' % (vector, result))

# argmax function

def argmax(vector):

index, value = 0, vector[0]

for i,v in enumerate(vector):

if v > value:

index, value = i,v

return index

# define vector

vector = [0.4, 0.5, 0.1]

# get argmax

result = argmax(vector)

print('arg max of %s: %d' % (vector, result))

Running the example prints the argmax of our test data used in the previous section, which in this case is an index of 1.

arg max of [0.4, 0.5, 0.1]: 1

1	arg max of [0.4, 0.5, 0.1]: 1

Argmax with NumPy

Thankfully, there is a built-in version of the argmax() function provided with the NumPy library.

This is the version that you should use in practice.

The example below demonstrates the argmax() NumPy function on the same vector of probabilities.

# numpy implementation of argmax
from numpy import argmax
# define vector
vector = [0.4, 0.5, 0.1]
# get argmax
result = argmax(vector)
print('arg max of %s: %d' % (vector, result))

# numpy implementation of argmax

from numpy import argmax

# define vector

vector = [0.4, 0.5, 0.1]

# get argmax

result = argmax(vector)

print('arg max of %s: %d' % (vector, result))

Running the example prints an index of 1, as is expected.

arg max of [0.4, 0.5, 0.1]: 1

1	arg max of [0.4, 0.5, 0.1]: 1

It is more likely that you will have a collection of predicted probabilities for multiple samples.

This would be stored as a matrix with rows of predicted probabilities and each column representing a class label. The desired result of an argmax on this matrix would be a vector with one index (or class label integer) for each row of predictions.

This can be achieved with the argmax() NumPy function by setting the “axis” argument. By default, the argmax would be calculated for the entire matrix, returning a single number. Instead, we can set the axis value to 1 and calculate the argmax across the columns for each row of data.

The example below demonstrates this with a matrix of four rows of predicted probabilities for the three class labels.

# numpy implementation of argmax
from numpy import argmax
from numpy import asarray
# define vector
probs = asarray([[0.4, 0.5, 0.1], [0.0, 0.0, 1.0], [0.9, 0.0, 0.1], [0.3, 0.3, 0.4]])
print(probs.shape)
# get argmax
result = argmax(probs, axis=1)
print(result)

# numpy implementation of argmax

from numpy import argmax

from numpy import asarray

# define vector

probs = asarray([[0.4, 0.5, 0.1], [0.0, 0.0, 1.0], [0.9, 0.0, 0.1], [0.3, 0.3, 0.4]])

print(probs.shape)

# get argmax

result = argmax(probs, axis=1)

print(result)

Running the example first prints the shape of the matrix of predicted probabilities, confirming we have four rows with three columns per row.

The argmax of the matrix is then calculated and printed as a vector, showing four values. This is what we expect, where each row results in a single argmax value or index with the largest probability.

(4, 3)
[1 2 0 2]

1 2	(4, 3) [1 2 0 2]

Summary

In this tutorial, you discovered the argmax function and how it is used in machine learning.

Specifically, you learned:

Argmax is an operation that finds the argument that gives the maximum value from a target function.
Argmax is most commonly used in machine learning for finding the class with the largest predicted probability.
Argmax can be implemented manually, although the argmax() NumPy function is preferred in practice.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

25 Responses to What Is Argmax in Machine Learning?

Soni April 3, 2020 at 9:12 pm #

I really appreciate the work you have done, you explained everything in such an amazing and simple way.

Reply
- Jason Brownlee April 4, 2020 at 6:18 am #
  
  Thanks!
  
  Reply
Akilu Rilwan April 3, 2020 at 10:38 pm #

Hi Jason,

Thank you for putting immense points in precise form, I always find good takeaway stuff after reading your post.

Reply
- Jason Brownlee April 4, 2020 at 6:18 am #
  
  You’re welcome.
  
  Reply
Ujwal Kavalipati May 2, 2020 at 11:17 pm #

Wow, what a clear explanation, thank you very much for keeping it very simple and understandable.

Reply
- Jason Brownlee May 3, 2020 at 6:12 am #
  
  Thanks, I’m happy it helped!
  
  Reply
alai May 25, 2020 at 10:45 pm #

WOW, thank you very much

Reply
- Jason Brownlee May 26, 2020 at 6:22 am #
  
  You’re welcome.
  
  Reply
falah obaid May 29, 2020 at 8:04 pm #

thank you very much

Reply
- Jason Brownlee May 30, 2020 at 5:57 am #
  
  You’re welcome.
  
  Reply
Megan Smith PhD July 11, 2020 at 5:37 pm #

Great laymen explanation by one of the top citizen data scientists and marketers on the web.

Reply
- Jason Brownlee July 12, 2020 at 5:45 am #
  
  Thanks for your kind words.
  
  Reply
Inderpreet Kaur August 29, 2020 at 10:10 pm #

I really appreciate the work you have done, you explained everything in such an amazing and simple way.

Reply
- Jason Brownlee August 30, 2020 at 6:40 am #
  
  Thanks!
  
  Reply
bruno martel October 10, 2020 at 12:02 am #

I’m not exaggerating when I say that you’re really changing the world a little with this articles. Sorry for filling the commentary space with a non-subject related statement but dude,you really rock.the articles are the perfect length,the perfect depth,clear withouth being superficial, amazing.

Reply
- Jason Brownlee October 10, 2020 at 7:06 am #
  
  Thanks!
  
  Reply
Balaji March 19, 2021 at 6:33 pm #

Can someone please tell me how to write the argmax equation for the above given example?

Reply
- Jason Brownlee March 20, 2021 at 5:18 am #
  
  The above tutorial shows you exactly how to write the argmax, both from scratch and using a library.
  
  What problem are you having exactly?
  
  Reply
Elmer June 16, 2021 at 9:40 pm #

Hi Jason,

This tutorial is super useful and thank you for that.

I am just wondering if it is possible for you to come up with another article that describes the mathematics equations and formulas that commonly used in machine learning field? I think it will be very helpful for people just enter this field (like me 🙂 to have some mathematical support of the common math techniques the researchers normally used.

I know you are busy and it takes times to come up with a good article. But this is just a suggestion and hope you could consider 🙂

Thanks
Elmer

Reply
- Jason Brownlee June 17, 2021 at 6:17 am #
  
  You’re welcome.
  
  Thanks for the suggestion.
  
  Reply
QingHe_Li November 4, 2022 at 10:39 pm #

I am very happy to have come across such a good article and a good author.Your work has made my path of machine learning smoother!

Reply
- James Carmichael November 5, 2022 at 8:05 am #
  
  Thank you very much for your feedback and support QingHe_Li! We greatly appreciate it!
  
  Reply
Bhaskar Jyoti Roy December 8, 2022 at 2:24 am #

why argmax is required for next word prediction?

Reply
- James Carmichael December 8, 2022 at 10:01 am #
  
  Hi Bhaskar…You may find the following resource of interest:
  
  https://towardsdatascience.com/a-deep-learning-approach-in-predicting-the-next-word-s-7b0ee9341bfe
  
  Reply
mrx April 30, 2023 at 4:35 am #

such a beautiful picture of the beach 🙂

Reply

Navigation

What Is Argmax in Machine Learning?

Tutorial Overview

What Is Argmax?

How Is Argmax Used in Machine Learning?

How to Implement Argmax in Python

Argmax from Scratch

Argmax with NumPy

Further Reading

Summary

Get a Handle on Linear Algebra for Machine Learning!

Develop a working understand of linear algebra

Finally Understand the Mathematics of Data

More On This Topic

25 Responses to What Is Argmax in Machine Learning?

Leave a Reply Click here to cancel reply.