R as a data analytics platform is expected to have a lot of support for various statistical tests. In this post, you are going to see how you can run statistical tests using the built-in functions in R. Specifically, you are going to learn:

- What is t-test and how to do it in R
- What is F-test and how to do it in R

Let’s get started.

## Overview

This post is divided into three parts; they are:

- Are They the Same?
- Two-Sample t-Test for Equal Means
- Other Statistical Tests

## Are They the Same?

Let’s consider the case that you have a regression problem, and you built two regression models. By feeding some test data, you notice that the models **never** perfectly match the expected result but are close enough to be useful. However, is there one model more accurate than another?

The metric for the accuracy of a regression model is the error, namely, how far off the model’s prediction to the actual value. By comparing the mean square error (MSE) of the two models, you can tell the one with lower MSE is better.

However, there is a problem: The mean of any metric would be sensitive to the sample set, and such randomness is inevitable. Therefore, normally, you cannot expect the mean from the two models would be the same. Claiming one model is better than another by merely a small difference in the metric is not robust.

In statistics, the rigorous way to make a claim is the following: First assume a hypothesis, named as the **null hypothesis**. Then, assume an **alternative hypothesis** that is different from the null hypothesis. Next, based on the data, prove that the null hypothesis cannot hold; therefore you must accept the alternative hypothesis.

This is the typical workflow for a statistical test.

## Two-Sample t-Test for Equal Means

The following shows how you can compare two sets of data for whether their mean equals in R:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
a <- c(18, 15, 18, 16, 17, 15, 14, 14, 14, 15, 15, 14, 15, 14, 22, 18, 21, 21, 10, 10, 11, 9, 28, 25, 19, 16, 17, 19, 18, 14, 14, 14, 14, 12, 13, 13, 18, 22, 19, 18, 23, 26, 25, 20, 21, 13, 14, 15, 14, 17, 11, 13, 12, 13, 15, 13, 13, 14, 22, 28, 13, 14, 13, 14, 15, 12, 13, 13, 14, 13, 12, 13, 18, 16, 18, 18, 23, 11, 12, 13, 12, 18, 21, 19, 21, 15, 16, 15, 11, 20, 21, 19, 15, 26, 25, 16, 16, 18, 16, 13, 14, 14, 14, 28, 19, 18, 15, 15, 16, 15, 16, 14, 17, 16, 15, 18, 21, 20, 13, 23, 20, 23, 18, 19, 25, 26, 18, 16, 16, 15, 22, 22, 24, 23, 29, 25, 20, 18, 19, 18, 27, 13, 17, 13, 13, 13, 30, 26, 18, 17, 16, 15, 18, 21, 19, 19, 16, 16, 16, 16, 25, 26, 31, 34, 36, 20, 19, 20, 19, 21, 20, 25, 21, 19, 21, 21, 19, 18, 19, 18, 18, 18, 30, 31, 23, 24, 22, 20, 22, 20, 21, 17, 18, 17, 18, 17, 16, 19, 19, 36, 27, 23, 24, 34, 35, 28, 29, 27, 34, 32, 28, 26, 24, 19, 28, 24, 27, 27, 26, 24, 30, 39, 35, 34, 30, 22, 27, 20, 18, 28, 27, 34, 31, 29, 27, 24, 23, 38, 36, 25, 38, 26, 22, 36, 27, 27, 32, 28, 31) b <- c(24, 27, 27, 25, 31, 35, 24, 19, 28, 23, 27, 20, 22, 18, 20, 31, 32, 31, 32, 24, 26, 29, 24, 24, 33, 33, 32, 28, 19, 32, 34, 26, 30, 22, 22, 33, 39, 36, 28, 27, 21, 24, 30, 34, 32, 38, 37, 30, 31, 37, 32, 47, 41, 45, 34, 33, 24, 32, 39, 35, 32, 37, 38, 34, 34, 32, 33, 32, 25, 24, 37, 31, 36, 36, 34, 38, 32, 38, 32) print(t.test(a, b)) |

This is what formally called the **two-sample t-test** as you have provided two vectors of numbers, `a`

and `b`

. The result from the function `t.test(a,b)`

is as follows:

1 2 3 4 5 6 7 8 9 10 |
Welch Two Sample t-test data: a and b t = -12.946, df = 136.87, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -11.915248 -8.757621 sample estimates: mean of x mean of y 20.14458 30.48101 |

The null hypothesis of this test is that the true means of the two samples are equal. But from the above, you found that the p-value is extremely small (below $2.2\times 10^{-16}$). Hence you should take the alternative hypothesis, which is the true means are not equal. The hypothesis used the term “true means” because it is the one that you cannot determine, but can only approximate by the sample data.

If this is the case, which one has higher mean? Unfortunately the t-test would not tell. But the output from the `t.test()`

function help you to determine it by providing the sample-estimated mean. In this case, the second one (vector `b`

) has mean of 30.48, which is higher.

This is how you should normally use the t-test. As another example, you can run the t-test on synthetic data:

1 2 3 |
a <- rnorm(100, mean=0, sd=1) b <- rnorm(150, mean=0.2, sd=1) print(t.test(a,b)) |

In the above code, you can see that you generated random numbers into vectors a and b, which has slightly different mean. The result of t-test would be the following:

1 2 3 4 5 6 7 8 9 10 |
Welch Two Sample t-test data: a and b t = -1.5268, df = 223.86, p-value = 0.1282 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.45642756 0.05791578 sample estimates: mean of x mean of y 0.02847865 0.22773454 |

Even though you know that the numbers are generated with different means, but since the difference is so small and the number of samples is not large enough, the result from the t-test gave you a p-value of 0.1282, which is not small enough to reject the null hypothesis.

Usually you would expect a p-value below 0.05 (and sometimes 0.01) to reject the null hypothesis. This is why designing the null and alternative hypotheses matters: Not only does it affect how the tests are computed, but you also favor the null hypothesis until there a strong enough evidence to rule it out.

## Other Statistical Tests

The test above is called the “two-sample t-test” because you provided two samples. There is also a **one-sample t-test**, as shown below:

1 2 |
a <- rnorm(100, mean=0, sd=1) print(t.test(a, mu=0.5)) |

The output of the above would be the following:

1 2 3 4 5 6 7 8 9 10 |
One Sample t-test data: a t = -3.5955, df = 99, p-value = 0.0005069 alternative hypothesis: true mean is not equal to 0.5 95 percent confidence interval: -0.1213488 0.3205669 sample estimates: mean of x 0.09960905 |

Here you can see that the test ruled out the null hypothesis as it reported a small p-value. This means you should not assume the numbers in vector `a`

have a mean at 0.5 (as you passed `mu=0.5`

to `t.test()`

function). R reported that the mean was about 0.1 at the end of the report. But it was the sample mean and it was the approximation to the unobservable true mean. The t-test tells you that it is unlikely the true mean was 0.5.

One-sample t-test is useful for not comparing two sets of data, but to confirm whether your data fits your presumed expectations.

Besides t-test, the other related and equally useful test is the F-test. Below is an example:

1 2 3 |
a <- rnorm(100, mean=0.5, sd=1.0) b <- rnorm(150, mean=0.5, sd=1.5) print(var.test(a, b)) |

The output of the above is as follows:

1 2 3 4 5 6 7 8 9 10 |
F test to compare two variances data: a and b F = 0.55678, num df = 99, denom df = 149, p-value = 0.00198 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.3905882 0.8043323 sample estimates: ratio of variances 0.5567847 |

While t-test compares the mean, F-test compares the variances. In R, it is performed with the `var.test()`

function. It is useful, for example, when you find two regression models produced similar MSE so the one with lower variance is better, as that model is more accurate in the worst case.

Note that F-test assumed the data are normally distributed. Practically it is often the case. But the result may be distorted if this assumption cannot hold.

In the example above, the data in the vectors `a`

and `b`

are of different size and generated using the Gaussian random number generator in R with different standard deviation but the same mean. The F-test result can find out they are different, by reporting the p-value of 0.00198, which is small enough to reject the null hypothesis. Formally, F-test’s null hypothesis is that the ratio of the variance of the two set of data is 1:1. Hence you can see the ratio of variance reported at the end of the output.

As an exercise, you can modify the programs above and try to generate different sizes of the dataset to see how well these tests perform. As a general rule, statistical tests are more confident if you provided more data. Hence with too little data, you will see the tests are harder to reject the null hypothesis.

## Further Readings

You can learn more about the above topics from the following:

### Websites

- NIST Engineering Statistics Handbook, section 1.3.5.3
- Data Used for two-sample t-test

### Books

## Summary

In this post, you learned how to perform statistical tests in R. Specifically you learned:

- What is null and alternative hypotheses in statistics
- How to use p-value to reject null hypothesis
- How to use t-test and F-test to compare mean and variance of two datasets

Hi Jason!

I want to ask you. In the second example,

a <- rnorm(100, mean=0, sd=1)

b <- rnorm(150, mean=0.2, sd=1)

print(t.test(a,b))

you say:

"p-value of 0.1282, which is not small enough to reject the null hypothesis."

So, we have to accept the null hypothesis. Why don't we accept it?

Hi George,

You should accept the null hypothesis by default unless you have strong evidence to reject it. This is how a statistical test normally expects you to do. Therefore, designing what a null hypothesis is and what’s its alternative is important. And also, you have to set a threshold for how strong the evidence is required. Often, we expect a p-value below 0.05 to be strong.

Hope this helps.

Why use % pipe (example in Keras…).

Python programmers will find almost similar code.

@Rimitti

But, the author is explaining how to do things in R. I’m sure that there are similar ways to do these various things in a lot of different languages.