# Generating Random Numbers in R

Whether working on a machine learning project, a simulation, or other models, you need to generate random numbers in your code. R as a programming language, has several functions for random number generation. In this post, you will learn about them and see how they can be used in a larger program. Specifically, you will learn

• How to generate Gaussian random numbers into a vector
• How to generate uniform random numbers
• How to manipulate random vectors and random matrices

Let’s get started.

Generating Random Numbers in R.
Photo by Kaysha. Some rights reserved.

## Overview

This post is divided into three parts; they are:

• Random Number Generators
• Correlated Multivariate Gaussian Random Number
• Generating Random Numbers From Uniform Distribution

## Random Number Generators

Random numbers are drawn from probability distributions. The most famous distribution should be the Gaussian distribution, also known as the normal distribution.

The standard normal distribution is defined by its density function, $f(x) = (2\pi)^{-1/2} \exp(-x^2/2)$. It has a range of the entire real number, i.e., from negative infinity to positive infinity. Integrating $f(x)$ over this range gives the result 1.0. In mathematics, we have the distribution function $F(x) = \int_{-\infty}^x (2\pi)^{-1/2} \exp(-t^2/2) dt$ defined as the probability that a number is below $x$ in the standard normal distribution.

In R, you have not only the random number generator function based on the standard normal distribution but a few other auxiliary functions as well:

• dnorm(x): The density function $f(x)$
• pnorm(x): The distribution function $F(x)$
• qnorm(x): The quantile function $F^{-1}(x)$ as the inverse function of $F(x)$
• rnorm(k): The random number generator function

The functions dnorm(x), pnorm(x), qnorm(x) would be useful in some projects that you need to deal with some other probability distribution. The function rnorm(k) is the function that gives you a vector of k random values that are drawn from the standard normal distribution. This is the function you use most often.

You can tell that this is the random number generator that produces output from the standard normal distribution by getting a lot of samples from it and plot a histogram:

Histogram of Gaussian random numbers

You can see a bell-shaped histogram centered at zero in the plot. Also, you can tell that majority of the samples are between -1 to +1 (68% of them, to be precise). The freq=FALSE parameter in the hist() function makes the $y$-axis in density instead of frequency. Hence you can match the histogram with the density function.

## Correlated Multivariate Gaussian Random Number

One common use case of standard normal random number generator is to create pairs of correlated Gaussian random number. In other words, you want some bivariate Gaussian random numbers with non-zero correlation.

The algorithm to generate such correlated random numbers is:

1. First, generate same number of independent standard normal random numbers
2. Set up the covariance matrix to specify how the different random numbers are related
3. Take the Cholesky decomposition of the covariance matrix
4. Multiply the matrix of independent standard normal random numbers with the Cholesky decomposition matrix

In code, it is:

Most of the code above is intuitive. The tricky part is adjusting the mean: Standard Gaussian random numbers are centered at zero. The Cholesky decomposition matrix obtained from the line cholesky <- chol(cov) can only adjust the covariance without adjusting the mean. To shift the mean value, you need to add a value uniformly on each column. In the above, you repeat the vector into a matrix of the same shape as the random observation so you can add them.

How you can tell this is correct? One way is to verify the statistics. You can compute and print the correlation matrix, the column mean, and column standard deviation as follows:

These three lines should print the empirical correlation matrix, the mean, and the standard deviation respectively. You can also try to plot the random numbers as a scatter plot:

Which should look like the following:

Bivariate Gaussian random numbers with mean at (0,1) and correlation 0.6

Putting everything together, the following is the complete code:

## Generating Random Numbers From Uniform Distribution

Gaussian is the most used distribution. But there are chances that you will need a different distribution. One example is to simulate the arrival time of the next phone call on a hotline. This is a classic example of the exponential distribution. Obviously, it is not Gaussian since Gaussian random values can be negative.

The exponential distribution has a density function $f(x) = \lambda \exp(-\lambda x)$. Since $x$ runs from zero to positive infinity, the distribution function is $F(x) = 1-\exp(-\lambda x)$. Recall that a distribution function has a value between 0 and 1.

With an arbitrary distribution function, you can use the inverse transform sampling method to generate random numbers using a uniform distribution. The idea is that you can consider $F(x)$ as the standard uniform distribution. Then you can look up the value of $x$ as the generated random number. In this case, you need to define the inverse of the distribution function $F^{-1}(x)$.

The inverse function in exponential distribution is trivial: Set $y = 1 â€“ exp(-lambda x)$, then $x = -\log(1-y)/\lambda$. Hence $F^{-1}(x) = -\log(1-y)/\lambda$. In R, you can generate exponential distributed random numbers as follows:

In the above, you used not rnorm() function but runif() function for a uniform distribution between 0 and 1. There are a whole set of functions for uniform distribution, too, such as dunif() and punif().

You knows the generated numbers follows an exponential distribution if you plot the histogram from the generated numbers:

Exponential distribution generated empirically.

This technique would be useful when you build a discrete event simulation model in R.