• 14. One sample t-test in R

Motivating Example: You understand the t-distribution, and using it to estimate uncertainty and test whether a mean equals \(\mu_0\). But doing this by hand is a lot of work. If only there were an easier way 🤔. Here, we show how R can do this for us.

Learning Goals: By the end of this section, you will be able to:

Perform a one-sample t-test using the t.test() function.
Use the broom package (tidy(), augment()) to generate clean, predictable output from model objects.
Frame a one-sample t-test as a simple linear model using the lm() function.
Interpret the output of summary() and tidy() for a simple linear model.

Code to load and process data

library(tidyr)
library(dplyr)
range_shift_file <- "https://whitlockschluter3e.zoology.ubc.ca/Data/chapter11/chap11q01RangeShiftsWithClimateChange.csv"
range_shift <- read_csv(range_shift_file) |>
  mutate(uphill = elevationalRangeShift > 0)|>
  separate(taxonAndLocation, into = c("taxon", "location"), sep = "_")

Stats in R

“What does a one-sample t-test in R even mean?” you might ask, “since we just did one.” Let me explain…

A poster saying "Easy peasy lemon squeezy." — Figure 1: R can make things easy! Image made with ChatGPT.

We just worked through the calculations for a confidence interval and a hypothesis test step by step. But R can do the entire calculation in a single line. Now that you know how it works and what R is doing, I can show you how to have R do this work for you.

`t.test()`

The t.test() function is the easiest way to run a one-sample t-test in R. There are two relevant arguments:

x: A vector of observations.
mu: \(\mu_0\) – i.e. the expected value of x under the null hypothesis.

Note: Because x must be a vector, I pull() the column out of its tibble.

t.test(x = pull(range_shift,elevationalRangeShift) , mu = 0)


    One Sample t-test

data:  pull(range_shift, elevationalRangeShift)
t = 7.1413, df = 30, p-value = 0.00000006057
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 28.08171 50.57635
sample estimates:
mean of x 
 39.32903

Tidying up R model outputs

The output of t.test() includes:

The test statistic, t.
The degrees of freedom.
The p-value.
The 95% confidence interval.
The sample estimate (in this case the mean).

While these results match our calculations (and are faster and easier to do), there is one problem – they are a mess. They are stored in a format that is easy for humans to read, but harder for a computer to process. The tidy() function in the broom package offers a convenient format:

library(broom)
t.test(x = pull(range_shift,elevationalRangeShift) , mu = 0) |>
  tidy()

t.test() output piped through broom's tidy() function
estimate	statistic	p.value	parameter	conf.low	conf.high	method	alternative
39.33	7.14	6.06e-08	30	28.08	50.58	One Sample t-test	two.sided

The statistic here is the t-value. The statistic column will contain the value of whichever test statistic your analysis produces.

The general linear model framework in R

An alternative approach to the t.test() function is to use R’s framework for general linear models. Remember that a linear model predicts the value of the response variable for the \(i^{th}\) observation \(\hat{Y_i}\) as a function of things in our model. For a one sample t-test, our model contains only the mean, which we represent as the “intercept”, \(a\):

\[\hat{Y_i} = a\]

Actual observations (\(Y_i\)) will deviate from model predictions (\(\hat{Y_i}\)). This deviation, known as the “residual” is denoted by \(e_i\):

See section on Residuals in the linear model chapter for our initial introduction to residuals.

\[Y_i = \hat{Y_i} + e_i\] \[Y_i = a+e_i\]

The lm() function in R builds such linear models. If we are only modeling the mean, the syntax is:

lm(y ~ 1) if y is a vector, or
lm(y ~ 1, data = dataset) if y is a column in the dataset.

As we’ve seen previously, the 1 tells R to fit an intercept-only model. The estimate for this intercept will be the mean of our response variable:

See the section on Mean in the linear model chapter for our initial introduction of the mean as a linear model.

lm(elevationalRangeShift ~ 1, data = range_shift)


Call:
lm(formula = elevationalRangeShift ~ 1, data = range_shift)

Coefficients:
(Intercept)  
      39.33

This may seem like overkill for a one-sample t-test, but the linear model framework generalizes to regression, ANOVA, and more. Seeing a one-sample t-test as a linear model prepares us for the models we’ll meet next.

`summary()` provides p-values and test statistics

lm() just returns the mean, but it actually calculates so much more! To access information like the standard error, t-value, p-value (reported as Pr(>|t|)), and degrees of freedom, pipe the model to summary():

lm(elevationalRangeShift ~ 1, data = range_shift) |>
  summary()


Call:
lm(formula = elevationalRangeShift ~ 1, data = range_shift)

Residuals:
    Min      1Q  Median      3Q     Max 
-58.629 -23.379  -3.529  24.121  69.271 

Coefficients:
            Estimate Std. Error t value     Pr(>|t|)    
(Intercept)   39.329      5.507   7.141 0.0000000606 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 30.66 on 30 degrees of freedom

`tidy()` cleans up the output

As above, we can use broom’s tidy() function to turn this output into a tidy, tabular format

library(broom)
lm(elevationalRangeShift ~ 1, data = range_shift) |>
  tidy()

# A tibble: 1 × 5
  term        estimate std.error statistic      p.value
  <chr>          <dbl>     <dbl>     <dbl>        <dbl>
1 (Intercept)     39.3      5.51      7.14 0.0000000606

Fitted values and residuals from augment()
elevationalRangeShift	.fitted	.resid
58.9	39.329	19.571
7.8	39.329	-31.529
108.6	39.329	69.271
44.8	39.329	5.471
11.1	39.329	-28.229

`augment()` shows residuals (and more)

broom’s augment() function shows each observation’s actual value, predicted value (.fitted), and residual (.resid). I’m only showing the first three columns because they are the only ones we’ve covered so far, and I only show the first five rows to save space.

library(broom)
lm(elevationalRangeShift ~ 1, data = range_shift) |>
  augment()|>
  select(1:3)|>
  slice_head(n = 5)

Stats in R

t.test()

Tidying up R model outputs

The general linear model framework in R

summary() provides p-values and test statistics

tidy() cleans up the output

augment() shows residuals (and more)

`t.test()`

`summary()` provides p-values and test statistics

`tidy()` cleans up the output

`augment()` shows residuals (and more)