Statistical Tests in R

By Adrian Tam on September 11, 2023 in R for Data Science 6

R as a data analytics platform is expected to have a lot of support for various statistical tests. In this post, you are going to see how you can run statistical tests using the built-in functions in R. Specifically, you are going to learn:

What is t-test and how to do it in R
What is F-test and how to do it in R

Let’s get started.

Statistical Tests in R.
Photo by Louis Reed. Some rights reserved.

Overview

This post is divided into three parts; they are:

Are They the Same?
Two-Sample t-Test for Equal Means
Other Statistical Tests

Are They the Same?

Let’s consider the case that you have a regression problem, and you built two regression models. By feeding some test data, you notice that the models never perfectly match the expected result but are close enough to be useful. However, is there one model more accurate than another?

The metric for the accuracy of a regression model is the error, namely, how far off the model’s prediction to the actual value. By comparing the mean square error (MSE) of the two models, you can tell the one with lower MSE is better.

However, there is a problem: The mean of any metric would be sensitive to the sample set, and such randomness is inevitable. Therefore, normally, you cannot expect the mean from the two models would be the same. Claiming one model is better than another by merely a small difference in the metric is not robust.

In statistics, the rigorous way to make a claim is the following: First assume a hypothesis, named as the null hypothesis. Then, assume an alternative hypothesis that is different from the null hypothesis. Next, based on the data, prove that the null hypothesis cannot hold; therefore you must accept the alternative hypothesis.

This is the typical workflow for a statistical test.

Two-Sample t-Test for Equal Means

The following shows how you can compare two sets of data for whether their mean equals in R:

a <- c(18, 15, 18, 16, 17, 15, 14, 14, 14, 15, 15, 14, 15, 14, 22, 18,
       21, 21, 10, 10, 11, 9, 28, 25, 19, 16, 17, 19, 18, 14, 14, 14,
       14, 12, 13, 13, 18, 22, 19, 18, 23, 26, 25, 20, 21, 13, 14, 15,
       14, 17, 11, 13, 12, 13, 15, 13, 13, 14, 22, 28, 13, 14, 13, 14,
       15, 12, 13, 13, 14, 13, 12, 13, 18, 16, 18, 18, 23, 11, 12, 13,
       12, 18, 21, 19, 21, 15, 16, 15, 11, 20, 21, 19, 15, 26, 25, 16,
       16, 18, 16, 13, 14, 14, 14, 28, 19, 18, 15, 15, 16, 15, 16, 14,
       17, 16, 15, 18, 21, 20, 13, 23, 20, 23, 18, 19, 25, 26, 18, 16,
       16, 15, 22, 22, 24, 23, 29, 25, 20, 18, 19, 18, 27, 13, 17, 13,
       13, 13, 30, 26, 18, 17, 16, 15, 18, 21, 19, 19, 16, 16, 16, 16,
       25, 26, 31, 34, 36, 20, 19, 20, 19, 21, 20, 25, 21, 19, 21, 21,
       19, 18, 19, 18, 18, 18, 30, 31, 23, 24, 22, 20, 22, 20, 21, 17,
       18, 17, 18, 17, 16, 19, 19, 36, 27, 23, 24, 34, 35, 28, 29, 27,
       34, 32, 28, 26, 24, 19, 28, 24, 27, 27, 26, 24, 30, 39, 35, 34,
       30, 22, 27, 20, 18, 28, 27, 34, 31, 29, 27, 24, 23, 38, 36, 25,
       38, 26, 22, 36, 27, 27, 32, 28, 31)
b <- c(24, 27, 27, 25, 31, 35, 24, 19, 28, 23, 27, 20, 22, 18, 20, 31,
       32, 31, 32, 24, 26, 29, 24, 24, 33, 33, 32, 28, 19, 32, 34, 26,
       30, 22, 22, 33, 39, 36, 28, 27, 21, 24, 30, 34, 32, 38, 37, 30,
       31, 37, 32, 47, 41, 45, 34, 33, 24, 32, 39, 35, 32, 37, 38, 34,
       34, 32, 33, 32, 25, 24, 37, 31, 36, 36, 34, 38, 32, 38, 32)
print(t.test(a, b))

a <- c(18, 15, 18, 16, 17, 15, 14, 14, 14, 15, 15, 14, 15, 14, 22, 18,

21, 21, 10, 10, 11, 9, 28, 25, 19, 16, 17, 19, 18, 14, 14, 14,

14, 12, 13, 13, 18, 22, 19, 18, 23, 26, 25, 20, 21, 13, 14, 15,

14, 17, 11, 13, 12, 13, 15, 13, 13, 14, 22, 28, 13, 14, 13, 14,

15, 12, 13, 13, 14, 13, 12, 13, 18, 16, 18, 18, 23, 11, 12, 13,

12, 18, 21, 19, 21, 15, 16, 15, 11, 20, 21, 19, 15, 26, 25, 16,

16, 18, 16, 13, 14, 14, 14, 28, 19, 18, 15, 15, 16, 15, 16, 14,

17, 16, 15, 18, 21, 20, 13, 23, 20, 23, 18, 19, 25, 26, 18, 16,

16, 15, 22, 22, 24, 23, 29, 25, 20, 18, 19, 18, 27, 13, 17, 13,

13, 13, 30, 26, 18, 17, 16, 15, 18, 21, 19, 19, 16, 16, 16, 16,

25, 26, 31, 34, 36, 20, 19, 20, 19, 21, 20, 25, 21, 19, 21, 21,

19, 18, 19, 18, 18, 18, 30, 31, 23, 24, 22, 20, 22, 20, 21, 17,

18, 17, 18, 17, 16, 19, 19, 36, 27, 23, 24, 34, 35, 28, 29, 27,

34, 32, 28, 26, 24, 19, 28, 24, 27, 27, 26, 24, 30, 39, 35, 34,

30, 22, 27, 20, 18, 28, 27, 34, 31, 29, 27, 24, 23, 38, 36, 25,

38, 26, 22, 36, 27, 27, 32, 28, 31)

b <- c(24, 27, 27, 25, 31, 35, 24, 19, 28, 23, 27, 20, 22, 18, 20, 31,

32, 31, 32, 24, 26, 29, 24, 24, 33, 33, 32, 28, 19, 32, 34, 26,

30, 22, 22, 33, 39, 36, 28, 27, 21, 24, 30, 34, 32, 38, 37, 30,

31, 37, 32, 47, 41, 45, 34, 33, 24, 32, 39, 35, 32, 37, 38, 34,

34, 32, 33, 32, 25, 24, 37, 31, 36, 36, 34, 38, 32, 38, 32)

print(t.test(a, b))

This is what formally called the two-sample t-test as you have provided two vectors of numbers, a and b. The result from the function t.test(a,b) is as follows:

	Welch Two Sample t-test

data:  a and b
t = -12.946, df = 136.87, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.915248  -8.757621
sample estimates:
mean of x mean of y 
 20.14458  30.48101

Welch Two Sample t-test

data: a and b

t = -12.946, df = 136.87, p-value < 2.2e-16

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-11.915248 -8.757621

sample estimates:

mean of x mean of y

20.14458 30.48101

The null hypothesis of this test is that the true means of the two samples are equal. But from the above, you found that the p-value is extremely small (below $2.2\times 10^{-16}$). Hence you should take the alternative hypothesis, which is the true means are not equal. The hypothesis used the term “true means” because it is the one that you cannot determine, but can only approximate by the sample data.

If this is the case, which one has higher mean? Unfortunately the t-test would not tell. But the output from the t.test() function help you to determine it by providing the sample-estimated mean. In this case, the second one (vector b) has mean of 30.48, which is higher.

This is how you should normally use the t-test. As another example, you can run the t-test on synthetic data:

a <- rnorm(100, mean=0, sd=1)
b <- rnorm(150, mean=0.2, sd=1)
print(t.test(a,b))

a <- rnorm(100, mean=0, sd=1)

b <- rnorm(150, mean=0.2, sd=1)

print(t.test(a,b))

In the above code, you can see that you generated random numbers into vectors a and b, which has slightly different mean. The result of t-test would be the following:

	Welch Two Sample t-test

data:  a and b
t = -1.5268, df = 223.86, p-value = 0.1282
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.45642756  0.05791578
sample estimates:
 mean of x  mean of y 
0.02847865 0.22773454

Welch Two Sample t-test

data: a and b

t = -1.5268, df = 223.86, p-value = 0.1282

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-0.45642756 0.05791578

sample estimates:

mean of x mean of y

0.02847865 0.22773454

Even though you know that the numbers are generated with different means, but since the difference is so small and the number of samples is not large enough, the result from the t-test gave you a p-value of 0.1282, which is not small enough to reject the null hypothesis.

Usually you would expect a p-value below 0.05 (and sometimes 0.01) to reject the null hypothesis. This is why designing the null and alternative hypotheses matters: Not only does it affect how the tests are computed, but you also favor the null hypothesis until there a strong enough evidence to rule it out.

Other Statistical Tests

The test above is called the “two-sample t-test” because you provided two samples. There is also a one-sample t-test, as shown below:

a <- rnorm(100, mean=0, sd=1)
print(t.test(a, mu=0.5))

1 2	a <- rnorm(100, mean=0, sd=1) print(t.test(a, mu=0.5))

The output of the above would be the following:

	One Sample t-test

data:  a
t = -3.5955, df = 99, p-value = 0.0005069
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
 -0.1213488  0.3205669
sample estimates:
 mean of x 
0.09960905

One Sample t-test

data: a

t = -3.5955, df = 99, p-value = 0.0005069

alternative hypothesis: true mean is not equal to 0.5

95 percent confidence interval:

-0.1213488 0.3205669

sample estimates:

mean of x

0.09960905

Here you can see that the test ruled out the null hypothesis as it reported a small p-value. This means you should not assume the numbers in vector a have a mean at 0.5 (as you passed mu=0.5 to t.test() function). R reported that the mean was about 0.1 at the end of the report. But it was the sample mean and it was the approximation to the unobservable true mean. The t-test tells you that it is unlikely the true mean was 0.5.

One-sample t-test is useful for not comparing two sets of data, but to confirm whether your data fits your presumed expectations.

Besides t-test, the other related and equally useful test is the F-test. Below is an example:

a <- rnorm(100, mean=0.5, sd=1.0)
b <- rnorm(150, mean=0.5, sd=1.5)
print(var.test(a, b))

a <- rnorm(100, mean=0.5, sd=1.0)

b <- rnorm(150, mean=0.5, sd=1.5)

print(var.test(a, b))

The output of the above is as follows:

	F test to compare two variances

data:  a and b
F = 0.55678, num df = 99, denom df = 149, p-value = 0.00198
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.3905882 0.8043323
sample estimates:
ratio of variances 
         0.5567847

F test to compare two variances

data: a and b

F = 0.55678, num df = 99, denom df = 149, p-value = 0.00198

alternative hypothesis: true ratio of variances is not equal to 1

95 percent confidence interval:

0.3905882 0.8043323

sample estimates:

ratio of variances

0.5567847

While t-test compares the mean, F-test compares the variances. In R, it is performed with the var.test() function. It is useful, for example, when you find two regression models produced similar MSE so the one with lower variance is better, as that model is more accurate in the worst case.

Note that F-test assumed the data are normally distributed. Practically it is often the case. But the result may be distorted if this assumption cannot hold.

In the example above, the data in the vectors a and b are of different size and generated using the Gaussian random number generator in R with different standard deviation but the same mean. The F-test result can find out they are different, by reporting the p-value of 0.00198, which is small enough to reject the null hypothesis. Formally, F-test’s null hypothesis is that the ratio of the variance of the two set of data is 1:1. Hence you can see the ratio of variance reported at the end of the output.

As an exercise, you can modify the programs above and try to generate different sizes of the dataset to see how well these tests perform. As a general rule, statistical tests are more confident if you provided more data. Hence with too little data, you will see the tests are harder to reject the null hypothesis.

Summary

In this post, you learned how to perform statistical tests in R. Specifically you learned:

What is null and alternative hypotheses in statistics
How to use p-value to reject null hypothesis
How to use t-test and F-test to compare mean and variance of two datasets

6 Responses to Statistical Tests in R

George September 13, 2023 at 5:29 pm #

Hi Jason!

I want to ask you. In the second example,

a <- rnorm(100, mean=0, sd=1)
b <- rnorm(150, mean=0.2, sd=1)
print(t.test(a,b))

you say:

"p-value of 0.1282, which is not small enough to reject the null hypothesis."

So, we have to accept the null hypothesis. Why don't we accept it?

- Adrian Tam September 16, 2023 at 4:45 am #
  
  Hi George,
  
  You should accept the null hypothesis by default unless you have strong evidence to reject it. This is how a statistical test normally expects you to do. Therefore, designing what a null hypothesis is and what’s its alternative is important. And also, you have to set a threshold for how strong the evidence is required. Often, we expect a p-value below 0.05 to be strong.
  
  Hope this helps.
  
Rimitti September 22, 2023 at 5:30 pm #

Why use % pipe (example in Keras…).
Python programmers will find almost similar code.

SLC September 25, 2023 at 7:32 pm #

@Rimitti

But, the author is explaining how to do things in R. I’m sure that there are similar ways to do these various things in a lot of different languages.

Yaswanth October 25, 2023 at 2:53 pm #

Insightful

Vyde October 25, 2023 at 2:54 pm #

Resourceful

Navigation

Statistical Tests in R

Overview

Are They the Same?

Two-Sample t-Test for Equal Means

Other Statistical Tests

Further Readings

Websites

Books

Summary

More On This Topic

6 Responses to Statistical Tests in R

Leave a Reply Click here to cancel reply.