The use of randomness is an important part of the configuration and evaluation of machine learning algorithms.
From the random initialization of weights in an artificial neural network, to the splitting of data into random train and test sets, to the random shuffling of a training dataset in stochastic gradient descent, generating random numbers and harnessing randomness is a required skill.
In this tutorial, you will discover how to generate and work with random numbers in Python.
After completing this tutorial, you will know:
- That randomness can be applied in programs via the use of pseudorandom number generators.
- How to generate random numbers and use randomness via the Python standard library.
- How to generate arrays of random numbers via the NumPy library.
Kick-start your project with my new book Statistics for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
Tutorial Overview
This tutorial is divided into three parts; they are:
- Pseudorandom Number Generators
- Random Numbers with the Python Standard Library
- Random Numbers with NumPy
1. Pseudorandom Number Generators
The source of randomness that we inject into our programs and algorithms is a mathematical trick called a pseudorandom number generator.
A random number generator is a system that generates random numbers from a true source of randomness. Often something physical, such as a Geiger counter or electrostatic noise, where the results are turned into random numbers. We do not need true randomness in machine learning. Instead we can use pseudorandomness. Pseudorandomness is a sample of numbers that look close to random, but were generated using a deterministic process.
Shuffling data and initializing coefficients with random values use pseudorandom number generators. These little programs are often a function that you can call that will return a random number. Called again, they will return a new random number. Wrapper functions are often also available and allow you to get your randomness as an integer, floating point, within a specific distribution, within a specific range, and so on.
The numbers are generated in a sequence. The sequence is deterministic and is seeded with an initial number. If you do not explicitly seed the pseudorandom number generator, then it may use the current system time in seconds or milliseconds as the seed.
The value of the seed does not matter. Choose anything you wish. What does matter is that the same seeding of the process will result in the same sequence of random numbers.
Let’s make this concrete with some examples.
2. Random Numbers with the Python Standard Library
The Python standard library provides a module called random that offers a suite of functions for generating random numbers.
Python uses a popular and robust pseudorandom number generator called the Mersenne Twister.
In this section, we will look at a number of use cases for generating and using random numbers and randomness with the standard Python API.
Need help with Statistics for Machine Learning?
Take my free 7-day email crash course now (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Seed The Random Number Generator
The pseudorandom number generator is a mathematical function that generates a sequence of nearly random numbers.
It takes a parameter to start off the sequence, called the seed. The function is deterministic, meaning given the same seed, it will produce the same sequence of numbers every time. The choice of seed does not matter.
The seed() function will seed the pseudorandom number generator, taking an integer value as an argument, such as 1 or 7. If the seed() function is not called prior to using randomness, the default is to use the current system time in milliseconds from epoch (1970).
The example below demonstrates seeding the pseudorandom number generator, generates some random numbers, and shows that reseeding the generator will result in the same sequence of numbers being generated.
1 2 3 4 5 6 7 8 9 10 11 |
# seed the pseudorandom number generator from random import seed from random import random # seed random number generator seed(1) # generate some random numbers print(random(), random(), random()) # reset the seed seed(1) # generate some random numbers print(random(), random(), random()) |
Running the example seeds the pseudorandom number generator with the value 1, generates 3 random numbers, reseeds the generator, and shows that the same three random numbers are generated.
1 2 |
0.13436424411240122 0.8474337369372327 0.763774618976614 0.13436424411240122 0.8474337369372327 0.763774618976614 |
It can be useful to control the randomness by setting the seed to ensure that your code produces the same result each time, such as in a production model.
For running experiments where randomization is used to control for confounding variables, a different seed may be used for each experimental run.
Random Floating Point Values
Random floating point values can be generated using the random() function. Values will be generated in the range between 0 and 1, specifically in the interval [0,1).
Values are drawn from a uniform distribution, meaning each value has an equal chance of being drawn.
The example below generates 10 random floating point values.
1 2 3 4 5 6 7 8 9 |
# generate random floating point values from random import seed from random import random # seed random number generator seed(1) # generate random numbers between 0-1 for _ in range(10): value = random() print(value) |
Running the example generates and prints each random floating point value.
1 2 3 4 5 6 7 8 9 10 |
0.13436424411240122 0.8474337369372327 0.763774618976614 0.2550690257394217 0.49543508709194095 0.4494910647887381 0.651592972722763 0.7887233511355132 0.0938595867742349 0.02834747652200631 |
The floating point values could be rescaled to a desired range by multiplying them by the size of the new range and adding the min value, as follows:
1 |
scaled value = min + (value * (max - min)) |
Where min and max are the minimum and maximum values of the desired range respectively, and value is the randomly generated floating point value in the range between 0 and 1.
Random Integer Values
Random integer values can be generated with the randint() function.
This function takes two arguments: the start and the end of the range for the generated integer values. Random integers are generated within and including the start and end of range values, specifically in the interval [start, end]. Random values are drawn from a uniform distribution.
The example below generates 10 random integer values between 0 and 10.
1 2 3 4 5 6 7 8 9 |
# generate random integer values from random import seed from random import randint # seed random number generator seed(1) # generate some integers for _ in range(10): value = randint(0, 10) print(value) |
Running the example generates and prints 10 random integer values.
1 2 3 4 5 6 7 8 9 10 |
2 9 1 4 1 7 7 7 10 6 |
Random Gaussian Values
Random floating point values can be drawn from a Gaussian distribution using the gauss() function.
This function takes two arguments that correspond to the parameters that control the size of the distribution, specifically the mean and the standard deviation.
The example below generates 10 random values drawn from a Gaussian distribution with a mean of 0.0 and a standard deviation of 1.0.
Note that these parameters are not the bounds on the values and that the spread of the values will be controlled by the bell shape of the distribution, in this case proportionately likely above and below 0.0.
1 2 3 4 5 6 7 8 9 |
# generate random Gaussian values from random import seed from random import gauss # seed random number generator seed(1) # generate some Gaussian values for _ in range(10): value = gauss(0, 1) print(value) |
Running the example generates and prints 10 Gaussian random values.
1 2 3 4 5 6 7 8 9 10 |
1.2881847531554629 1.449445608699771 0.06633580893826191 -0.7645436509716318 -1.0921732151041414 0.03133451683171687 -1.022103170010873 -1.4368294451025299 0.19931197648375384 0.13337460465860485 |
Note: In the random module, there is a function normalvariate()
that functions the same as gauss()
. The former is thread-safe while gauss()
is not. However, you rarely run Python in multithread and gauss()
is faster.
Randomly Choosing From a List
Random numbers can be used to randomly choose an item from a list.
For example, if a list had 10 items with indexes between 0 and 9, then you could generate a random integer between 0 and 9 and use it to randomly select an item from the list. The choice() function implements this behavior for you. Selections are made with a uniform likelihood.
The example below generates a list of 20 integers and gives five examples of choosing one random item from the list.
1 2 3 4 5 6 7 8 9 10 11 12 |
# choose a random element from a list from random import seed from random import choice # seed random number generator seed(1) # prepare a sequence sequence = [i for i in range(20)] print(sequence) # make choices from the sequence for _ in range(5): selection = choice(sequence) print(selection) |
Running the example first prints the list of integer values, followed by five examples of choosing and printing a random value from the list.
1 2 3 4 5 6 |
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] 4 18 2 8 3 |
Random Subsample From a List
We may be interested in repeating the random selection of items from a list to create a randomly chosen subset.
Importantly, once an item is selected from the list and added to the subset, it should not be added again. This is called selection without replacement because once an item from the list is selected for the subset, it is not added back to the original list (i.e. is not made available for re-selection).
This behavior is provided in the sample() function that selects a random sample from a list without replacement. The function takes both the list and the size of the subset to select as arguments. Note that items are not actually removed from the original list, only selected into a copy of the list.
The example below demonstrates selecting a subset of five items from a list of 20 integers.
1 2 3 4 5 6 7 8 9 10 11 |
# select a random sample without replacement from random import seed from random import sample # seed random number generator seed(1) # prepare a sequence sequence = [i for i in range(20)] print(sequence) # select a subset without replacement subset = sample(sequence, 5) print(subset) |
Running the example first prints the list of integer values, then the random sample is chosen and printed for comparison.
1 2 |
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] [4, 18, 2, 8, 3] |
Randomly Shuffle a List
Randomness can be used to shuffle a list of items, like shuffling a deck of cards.
The shuffle() function can be used to shuffle a list. The shuffle is performed in place, meaning that the list provided as an argument to the shuffle() function is shuffled rather than a shuffled copy of the list being made and returned.
The example below demonstrates randomly shuffling a list of integer values.
1 2 3 4 5 6 7 8 9 10 11 |
# randomly shuffle a sequence from random import seed from random import shuffle # seed random number generator seed(1) # prepare a sequence sequence = [i for i in range(20)] print(sequence) # randomly shuffle the sequence shuffle(sequence) print(sequence) |
Running the example first prints the list of integers, then the same list after it has been randomly shuffled.
1 2 |
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] [11, 5, 17, 19, 9, 0, 16, 1, 15, 6, 10, 13, 14, 12, 7, 3, 8, 2, 18, 4] |
3. Random Numbers with NumPy
In machine learning, you are likely using libraries such as scikit-learn and Keras.
These libraries make use of NumPy under the covers, a library that makes working with vectors and matrices of numbers very efficient.
NumPy also has its own implementation of a pseudorandom number generator and convenience wrapper functions.
NumPy also implements the Mersenne Twister pseudorandom number generator.
Let’s look at a few examples of generating random numbers and using randomness with NumPy arrays.
Seed The Random Number Generator
The NumPy pseudorandom number generator is different from the Python standard library pseudorandom number generator.
Importantly, seeding the Python pseudorandom number generator does not impact the NumPy pseudorandom number generator. It must be seeded and used separately.
The seed() function can be used to seed the NumPy pseudorandom number generator, taking an integer as the seed value.
The example below demonstrates how to seed the generator and how reseeding the generator will result in the same sequence of random numbers being generated.
1 2 3 4 5 6 7 8 9 10 11 |
# seed the pseudorandom number generator from numpy.random import seed from numpy.random import rand # seed random number generator seed(1) # generate some random numbers print(rand(3)) # reset the seed seed(1) # generate some random numbers print(rand(3)) |
Running the example seeds the pseudorandom number generator, prints a sequence of random numbers, then reseeds the generator showing that the exact same sequence of random numbers is generated.
1 2 |
[4.17022005e-01 7.20324493e-01 1.14374817e-04] [4.17022005e-01 7.20324493e-01 1.14374817e-04] |
Array of Random Floating Point Values
An array of random floating point values can be generated with the rand() NumPy function.
If no argument is provided, then a single random value is created, otherwise the size of the array can be specified.
The example below creates an array of 10 random floating point values drawn from a uniform distribution.
1 2 3 4 5 6 7 8 |
# generate random floating point values from numpy.random import seed from numpy.random import rand # seed random number generator seed(1) # generate random numbers between 0-1 values = rand(10) print(values) |
Running the example generates and prints the NumPy array of random floating point values.
1 2 3 |
[4.17022005e-01 7.20324493e-01 1.14374817e-04 3.02332573e-01 1.46755891e-01 9.23385948e-02 1.86260211e-01 3.45560727e-01 3.96767474e-01 5.38816734e-01] |
Array of Random Integer Values
An array of random integers can be generated using the randint() NumPy function.
This function takes three arguments, the lower end of the range, the upper end of the range, and the number of integer values to generate or the size of the array. Random integers will be drawn from a uniform distribution including the lower value and excluding the upper value, e.g. in the interval [lower, upper).
The example below demonstrates generating an array of random integers.
1 2 3 4 5 6 7 8 |
# generate random integer values from numpy.random import seed from numpy.random import randint # seed random number generator seed(1) # generate some integers values = randint(0, 10, 20) print(values) |
Running the example generates and prints an array of 20 random integer values between 0 and 10.
1 |
[5 8 9 5 0 0 1 7 6 9 2 4 5 2 4 2 4 7 7 9] |
Array of Random Gaussian Values
An array of random Gaussian values can be generated using the randn() NumPy function.
This function takes a single argument to specify the size of the resulting array. The Gaussian values are drawn from a standard Gaussian distribution; this is a distribution that has a mean of 0.0 and a standard deviation of 1.0.
The example below shows how to generate an array of random Gaussian values.
1 2 3 4 5 6 7 8 |
# generate random Gaussian values from numpy.random import seed from numpy.random import randn # seed random number generator seed(1) # generate some Gaussian values values = randn(10) print(values) |
Running the example generates and prints an array of 10 random values from a standard Gaussian distribution.
1 2 |
[ 1.62434536 -0.61175641 -0.52817175 -1.07296862 0.86540763 -2.3015387 1.74481176 -0.7612069 0.3190391 -0.24937038] |
Values from a standard Gaussian distribution can be scaled by multiplying the value by the standard deviation and adding the mean from the desired scaled distribution. For example:
1 |
scaled value = mean + value * stdev |
Where mean and stdev are the mean and standard deviation for the desired scaled Gaussian distribution and value is the randomly generated value from a standard Gaussian distribution.
Shuffle NumPy Array
A NumPy array can be randomly shuffled in-place using the shuffle() NumPy function.
The example below demonstrates how to shuffle a NumPy array.
1 2 3 4 5 6 7 8 9 10 11 |
# randomly shuffle a sequence from numpy.random import seed from numpy.random import shuffle # seed random number generator seed(1) # prepare a sequence sequence = [i for i in range(20)] print(sequence) # randomly shuffle the sequence shuffle(sequence) print(sequence) |
Running the example first generates a list of 20 integer values, then shuffles and prints the shuffled array.
1 2 |
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] [3, 16, 6, 10, 2, 14, 4, 17, 7, 1, 13, 0, 19, 18, 9, 15, 8, 12, 11, 5] |
Modern Ways of Random Number Generation in NumPy
In newer version of NumPy, you can do random number generation the following way:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import numpy as np rng = np.random.Generator(np.random.PCG64()) rng = np.random.default_rng() # uniform from 0 to 1 value = rng.random() # generate 10 Gaussian random number value = rng.standard_normal(10) # generate 20 random integers between 0 and 10 value = rng.integers(low=0, high=10, size=20) # shuffle a sequence in-place rng.shuffle(sequence) |
The object rng
is a random number generator. You can create multiple such generators, or use the default one. The idea is to allow you to have multiple independent random number generator so drawing random numbers from one generator would not affect another. This would make your code more robust (because you can mitigate the race condition in parallel algorithms) and allows you to fine-tune the pseudo-random number generation algorithm.
Further Reading
This section provides more resources on the topic if you are looking to go deeper.
- Embrace Randomness in Machine Learning
- random – Generate pseudo-random numbers
- Random sampling in NumPy
- Pseudorandom number generator on Wikipedia
Summary
In this tutorial, you discovered how to generate and work with random numbers in Python.
Specifically, you learned:
- That randomness can be applied in programs via the use of pseudorandom number generators.
- How to generate random numbers and use randomness via the Python standard library.
- How to generate arrays of random numbers via the NumPy library.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Beautiful! Thank you so much! This was just what I needed today and I found it randomly, or should I say pseudorandomly! Haha!
I’m glad it helped.
thanks for great article … It helped me to understand the different ways to generate random numbers..
Thanks.
This is quite helpful Jason.
Thanks
I’m glad to hear it.
Very informative blog!
I have a question:
What is the significance of the number that we pass to .seed() ?
e.g. if I run following codes:
#Code 1:
np.random.seed(0)
np.random.rand(4)
#Code 2:
np.random.seed(10)
np.random.rand(4)
Both show different output. So, what is the difference in np.random.seed(10) and np.random.seed(0) ?
It is feed into the equation that starts the sequence of random numbers. The same seed will give the same sequence of randomness.
so it’s not random…. when you run the module it will gives you the same “”RANDOM”” numbers so it’s not random……..
Correct, it is not “random”, it is pseudorandom controlled by a mathematical function.
Yea!!! Tks so much Jason. This is perfect for me!
I’m happy to hear that.
Thank you so much Jason.
Just out of the related topic, Is there anyway to save the generated random numbers to a csv file ?
Yes, you can store them in an array and save the array in CSV format.
Perhaps this will help:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html
Hi Jason, i am trying to create multiple outcomes(via different seeds) and plot on the same graph using the numpy pseudorandom number generator(np.random.randomState(seed).
Is there a way to write it in one code and not write codes for lets say 10 different seeds?
George
I’m not sure what you’re trying to achieve exactly?
What i mean is, for instance is there a way to create n different random seeds that should all have different outcomes like you have explained in one single code.
specifically, Is it possible to just have one code to randomly select n different seeds rather than have to write a code with a different seed n times if i want n different outcomes/samples?
If you need many random numbers, you only need one random seed and you can generate a sequence of many random numbers.
Does that help?
CAN YOU GIVE CODE FOR THAT
The above tutorial shows how to generate a sequence of random numbers.
You can use a while loop for different values of the seed. Then use random.randint(a, b).
I just did it works!
Absolutely. Got it.
Thanks
No problem.
Amazing. Thanks Jason.
Thanks, I’m glad it helped.
Dear Dr Jason,
Thank you for the tutorial.
I had a go at the exercises and came to the conclusion on generating random integers:
To generate a set of random integers where the numbers without repeating = without replacement read the sections:
To generate a set of random integers by putting the numbers ‘back into the hat’ = with replacement = may include repeats read:
Thank you,
Anthony of Sydney
Nice!
Dr Jason,
Thank you for your valuable posts.
I tried the following and got no result – that is “None” is printed
Yet, when I do this,
from random import sample
subset = sample(x,100); #subset the whole sample to get around the original problem
subset
[97, 68, 3, 37, 29, 39, 52, 57, 5, 98, 33, 79, 65, 94, 16, 87, 28, 20, 72, 12, 46, 34, 78, 76, 59, 2, 48, 71, 18, 92, 26, 51, 54, 6, 41, 81, 74, 21, 11, 50, 22, 56, 44, 4, 69, 0, 14, 64, 66, 89, 7, 32, 27, 58, 62, 67, 61, 23, 36, 84, 24, 45, 25, 9, 38, 99, 19, 70, 95, 85, 80, 1, 13, 47, 86, 83, 82, 35, 15, 60, 8, 40, 75, 17, 31, 77, 30, 93, 10, 55, 49, 42, 53, 43, 73, 90, 63, 88, 96, 91]
Why didn’t the “shuffle” command” work? That is why did supposed shuffled array produce a “None” result?
Thank you,
Anthony of Sydney
Even after resetting the computer, I could not work out why using the “shuffle” command the result is nothing.
The aim was to generate an array of x and fx, where fx = x**2,
The above works, even the code below.
It seems that when you use shuffle directly on the variable/2d array you can shuffle, but the original array is modified,
For some inexplicable reason, you cannot do this:
Don’t know why please assist.
Thank you,
Anthony of Sydney NSW
The shuffle() function operates on the array in place. It does not return anything:
https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.shuffle.html
Dear Dr Jason,
Thank you
Anthony of Sydney
I think shuffle occurs in place, you have assigned xshuffled “None”.
Dear Dr Jason,
After reading the above comment and the content of the referred page two comments up, it returns “None”.
Thank you for that, it is appreciated.
Anthony of Sydney
hi how to combine this random output in one text file?
and how to combine random output of alphanumeric, alphabetic and integer
You can generate numpy arrays, concatenate them and call savetxt.
This might help:
https://machinelearningmastery.com/how-to-save-a-numpy-array-to-file-for-machine-learning/
thank you so much!!!!
You’re welcome.
i don’t know python. teach me
Sure, start here:
https://machinelearningmastery.com/faq/single-faq/how-do-i-get-started-with-python-programming
Hello I’m new to python and I would like to name my lists of random numbers and add them.
How do I do that?
Say I have two lists of ten random numbers and want to add the two lists to make a 3rd.
Perhaps make the lists into numpy arrays and use the add() function.
thank you again, easy to understand and to implement! the right approach for beginners like me!
Thanks, I’m happy to hear that!
I need to create 100 random(floating) numbers between 1 and 3. How can i do that?
The above tutorial will show you how exactly!
How do I plot random numbers from 1-100 on a histogram? It is giving me plotted and not all the values.
Good question.
First generate your numbers and store in a list or array.
Then use the matplotlib hist() function and pass it your list or array of numbers.
https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.hist.html
thanks!
You’re welcome.
Very nice tutorial. I came here looking for something I expected at the very end, but didn’t find:
how to generate integer numbers from standard normal distribution?
Something like the equivalent of randint but for a normal instead of a uniform distribution.
Or in other words, something like randn but returns an integer.
In a way it would be something like “randnint”
Is there such a function?
Thanks!
Good question, perhaps generate gaussian real values and either rescale them to your desired range or multiply by 10, 100, 1000, etc. and round the results.
I suspect there are better approaches, it might be a good idea to check the literature for an efficient algorithm.
thank you
You’re welcome.
Many many thanks Dr, Jason!
It helps me a lot and surely does to others as well.
Beautiful Sharing
Thanks, I’m happy to hear that.
How can I randomly generate information other than just numbers? I would like to generate random sports teams for american football. Any ideas?
You can have a list of sport teams, 1-n then generate a number in 1-n to select a random item from the list.
Hello Jason how can I generate random number from a machine number(52 39 70 77 73)
Yes, you can generate random integers, see the above examples.