• 15. Two t Summary – Applied Biostatistics

Chapter summary

We can naturally build from our description and analysis of a single sample to the more common scenario of comparing two samples. When data meet the assumptions of independence, lack of bias, being well summarized by the mean, normal residuals, and equal variance between groups, we can use the standard t-test machinery (with a slightly different calculation for the standard error) to test null hypotheses and estimate uncertainty. When group variances differ, we can use Welch’s t-test for unequal variance.

Chatbot tutor

Please interact with this custom chatbot (link here). I have made it to help you with this chapter. I suggest interacting with at least ten back-and-forths to ramp up and then stopping when you feel like you got what you needed from it.

Practice Questions

Try these questions! By using the R environment you can work without leaving this “book”. I even pre-loaded all the packages you need!

SETUP: There are plenty of reasons to choose your partner carefully. In much of the biological world a key reason is “evolutionary fitness” - presumably organisms evolve to choose mates that will help them make more (or healthier) children. This could, for example explain Kermit’s resistance in one of the more complex love stories of our time, as frogs and pigs are unlikely to make healthy children..

To evaluate this idea Swierk & Langkilde (2019), identified a male’s top choice out of two female wood frogs and then had them mate with the preferred or unpreferred female and counted the number of hatched eggs.

The R code below loads the data. It is meant to be loaded onto this console below too, but if something goes wrong, just paste this into your R console outside of this book

library(dplyr);    library(readr);  library(ggplot2);   library(janitor)
frog_link <- "https://raw.githubusercontent.com/ybrandvain/biostat/master/data/Swierk_Langkilde_BEHECO_1.csv"
frogs <- read_csv(frog_link) |>
  clean_names()

Q1. Complete the code above, what pattern do you see?

Similar numbers (or slight differences) of hatched eggs in preferred and nonpreferred treatments Way more hatched eggs in the preferred treatment Way more hatched eggs in the nonpreferred treatment

Panel A shows a histogram of hatched eggs across all treatments, with most values clustered around 250 but spread up to 1000. Panel B splits this into two histograms: the nonpreferred treatment (red, left) with a broader, flatter spread, and the preferred treatment (teal, right) with a strong peak near 250 hatched eggs. Both panels use the same x-axis (0–1000 hatched eggs) and y-axis (count). — Figure 2: Histograms of the number of hatched eggs across treatments. (A) Distribution when pooling both treatments together. (B) Distributions shown separately for the nonpreferred and the preferred treatments.

Q2. Consider Figure 2, above. Which plot is more useful to evaluate the normality assumption of the two sample t-test?

A) Both treatments combined. B) Separate data by treatment. It depends, you should integrate information from both. Neither plot is appropriate

Q3. Consider the output of the code above. Should you feel comfortable assuming equal variance?

The variance is not equal – the variance in the “preferred” group is 1.20 times greater than the non-preferred group. But remember that we don’t worry about differences in variance until the difference is more than 4X.

Q4. For now we’ll go on with a t-test approach, regardless. So find the pooled variance in the R interface above .

\[s^2_p = \frac{\text{df}_1 \times s^2_1 + \text{df}_2 \times s^2_2}{\text{df}_1+\text{df}_2} \text{, and df}_i = n_i-1\]

We can copy and paste R output into a calculator (that calculator might be R). I think this is best for understanding.

frogs |> 
  group_by(treatment)|>
  summarise(MEAN = mean(hatched_eggs),
            VAR  = var(hatched_eggs),
            N    = n())

# A tibble: 2 × 4
  treatment     MEAN    VAR     N
  <chr>        <dbl>  <dbl> <int>
1 nonpreferred  414. 56118.    29
2 preferred     345. 67412.    27

# Pooled variance
((56118*(29-1)) +(67412*(27-1))) / (29 + 27 - 2)

[1] 61555.85

OR

We do this in one long workflow in R. I think this is the best practice for getting exact answers.

frogs |> 
    group_by(treatment)|>
    summarise(MEAN = mean(hatched_eggs),
              VAR  = var(hatched_eggs),
              N    = n())|>
    summarise(pooled_var = sum((N-1)*VAR) / (sum(N)-2) )

# A tibble: 1 × 1
  pooled_var
       <dbl>
1     61556.

Q5. Given the answers above, characterize this effect size.

Not worth reporting (Cohen's D is less than 0.01) Tiny (Cohen's D is between 0.01 and 0.20) Small (Cohen's D is between 0.20 and 0.50) Medium (Cohen's D is between 0.50 and 0.80) Large (Cohen's D is between 0.80 and 1.10) Very large (Cohen's D is between 1.20 and 2.00) Huge (Cohen's D is greater than 2.00)

Q6. From this Cohen’s D value, we conclude that:

We should reject the null hypothesis We should fail to reject the null hypothesis The null is false The null is true Even if statistically significant, this effect size would be practically negligible.

Q7 State the null hypothesis

Q8 State the alternative hypothesis

Q9. Which values are NOT IN the 95% confidence interval or the difference (i.e. preferred - nonpreferred) select all that apply:

t.test(hatched_eggs ~ treatment, 
       data = frogs, 
       var.equal = TRUE)


    Two Sample t-test

data:  hatched_eggs by treatment
t = 1.0372, df = 54, p-value = 0.3043
alternative hypothesis: true difference in means between group nonpreferred and group preferred is not equal to 0
95 percent confidence interval:
 -64.20569 201.84554
sample estimates:
mean in group nonpreferred    mean in group preferred 
                  413.9310                   345.1111

Q10) The output of the code above shows a p-value of $\approx 0.30$, so we fail to reject the null hypothesis. The output also shows a t-value of $\approx 1$.

Q10A) If all I knew was that $t \approx 1$, but did not kow the degrees of freedom, would I be able to reject the null at $\alpha = 0.05$?

Q10B) Say for a different dataset I knew that $t = 2$ (rather than one) but did not kow the degrees of freedom, would I be able to reject the null at $\alpha = 0.05$?

When the absolute value of t is less than 1.96 we will fail to reject the null hypothesis at $\alpha = 0.05$, regardless of the degrees of freedom.

Whenever the absolute value of t is greater than 1.96 we may or may not fail to reject the null hypothesis at $\alpha = 0.05$, depending on the degrees of freedom.

If we have more than five degrees of freedom, we will reject the null when the absolute value of t is greater than 2.5.

Q11. We fail to reject the null hypothesis. This means the null is .

Q12. If you had to make a bet, the safer bet in this case is that the null hypothesis is

In practice, it is very unlikely that two population means are exactly equal. Even if the true difference is small, it is rarely zero. Statistical tests help us decide whether we have enough evidence to detect that difference. Not whether it exists at all.

This is why we do not accept the null - we simply fail to reject it.

📊 Glossary of Terms

Assumptions

Independence: Each observation must be independent of others; in two-sample designs, each group’s values should not influence the other.
Unbiased Sampling: Data should be collected without systematic bias so that results generalize to the population.
Normality of Residuals: Within each group, the distribution of residuals (observed – group mean) should be approximately normal.
Homoscedasticity (Equal Variance): The spread of values should be roughly the same in each group. In practice, differences smaller than a 4:1 ratio usually have little effect.

Summaries & Estimates

Group Mean ($\bar{x}$): The average of all observations in a group.
Variance ($s^2$): The spread of observations within a group, calculated as $s^2 = \frac{\sum (x_i - \bar{x})^2}{n-1}$.
Pooled Variance ($s_p^2$): A weighted average of group variances used in the standard two-sample t-test: $s_p^2 = \frac{df_1 s_1^2 + df_2 s_2^2}{df_1 + df_2}$
Difference in Means ($\Delta \bar{x}$): The difference between the two group averages, $\Delta \bar{x} = \bar{x}_1 - \bar{x}_2$.
Cohen’s D: A standardized measure of effect size for two means: $d = \frac{\bar{x}_1 - \bar{x}_2}{s_p}$

The Two-Sample t-Test

Test Statistic (t): Measures how many standard errors separate the observed difference from the null hypothesis difference (usually 0): $t = \frac{(\bar{x}_1 - \bar{x}_2)}{s_{\overline{x_1 - x_2}}}$
Degrees of Freedom (df): For the standard test with equal variance: $df = (n_1 - 1) + (n_2 - 1) = n_1 + n_2 - 2$
Welch’s t-Test: A version of the t-test that does not assume equal variance. The denominator uses each group’s variance scaled by its sample size, and $df$ are approximated with the Welch–Satterthwaite equation.
Wilcoxon Rank-Sum Test: A non-parametric alternative to the two-sample t-test that compares the ranks of values between groups, effectively testing for differences in medians.

🛠️ Key R Functions

Functions

t.test(): Performs a one-sample, two-sample, or paired t-test. Use the formula syntax t.test(y ~ group, data = df, var.equal = TRUE) for the standard two-sample test. By default, Welch’s test is used (var.equal = FALSE).
wilcox.test(): Performs the Wilcoxon rank-sum test (Mann–Whitney U test), a non-parametric alternative that compares medians between two groups.
broom::tidy(): Converts test outputs (like from t.test()) into tidy data frames, making it easier to report and manipulate results.

Syntax

Two-sample t-test (equal variance assumed)

t.test(y ~ group, data = df, var.equal = TRUE)

Two-sample Welch’s t-test (default in R, no equal variance assumption)

t.test(y ~ group, data = df)

Wilcoxon rank-sum test (non-parametric alternative)

wilcox.test(y ~ group, data = df, exact = FALSE)

Additional resources

Videos:

Other resources:

Datanovia: t-test