In the previous section, we calculated a difference between pink- and white- petaled RILs at site SM of \(1.76 - 0.733 = 1.027\). We also quantified uncertainty about this estimate as a standard error of 0.262. Recalling the equation for the t-statistic:
\[t = \frac{\bar{x}-\mu_0}{s_\bar{x}} = \frac{(\bar{x_1}-\bar{x_2})-0}{s_\overline{x_1-x_2}}\] In this case
\[t = \frac{1.76 - 0.733}{0.262}= 3.91\]
Since we had 105 degrees of freedom, our critical t-value will be just a hair greater than 1.96. Because our observed t-value of \(\approx 5\) is way bigger than our critical t-value of \(\approx 2\), we strongly reject the null hypothesis that pink and white-flowered Clarkia xantiana subspecies parviflora RILs at site SM are visited equally by pollinators.
We can use the “formula” syntax in the t.test() function to have R test this null for us. Note that we set var.equal = TRUE. We see that this provides p-values and 95% confidence intervals identical to what we calculated ourselves:
t.test(visits ~ petal_color, data = SR_visits, var.equal =TRUE)
Two Sample t-test
data: visits by petal_color
t = 3.9082, df = 105, p-value = 0.0001649
alternative hypothesis: true difference in means between group pink and group white is not equal to 0
95 percent confidence interval:
0.505242 1.545845
sample estimates:
mean in group pink mean in group white
1.758946 0.733402
Again, we can make this output tidy / easier to process in R with the tidy() function in the broom package:
For some reason that I don’t understand, tidy() labels the column with the “degrees of freedom” as “parameter”. smdh.
library(broom)t.test(visits ~ petal_color, data = SR_visits, var.equal =TRUE)|>tidy()
estimate
estimate1
estimate2
statistic
p.value
parameter
conf.low
conf.high
method
alternative
1.026
1.759
0.733
3.908
1.65e-4
105
0.505
1.546
Two Sample t-test
two.sided
If our data met assumptions we would be done. But they do not. So let’s validate our conclusions with tests whose assumptions better match our data.
A two-sample t-test is often robust
Now that we know we can reject the null when data are transformed to meet assumptions of the two-sample t-test, and we have an estimate of the bootstrap 95% confidence interval, we could complement these analyses with an analysis of the untransformed data. This step is not always necessary or reliable - but we are using our brains to tell a coherent story.
Now let’s use R to test the null hypothesis and estimate confidence intervals. For fun let’s compare results from an analysis assuming equal variance to one that does not:
library(broom)bind_rows(t.test(visits ~ petal_color, data = SR_visits, var.equal =TRUE)|>tidy(),t.test(visits ~ petal_color, data = SR_visits, var.equal =FALSE)|>tidy())
estimate
estimate1
estimate2
statistic
p.value
parameter
conf.low
conf.high
method
alternative
1.026
1.759
0.733
3.908
1.65e-4
105.0
0.505
1.546
Two Sample t-test
two.sided
1.026
1.759
0.733
4.047
1.61e-4
89.6
0.522
1.529
Welch Two Sample t-test
two.sided
The Welch’s two-sample t-test does not assume equal variance, and in practice is universally better than the standard two-sample t-test (that’s why R has it as a default).
However the standard two-sample t-test is often good enough. It also has the benefit of being much like all linear modeling efforts, and is simpler mathematically, so we usually teach the standard two-sample t-test. In practice the difference between these tests rarely matters, except for when the variance between groups is massively different
Permutation test
Of course, rather than using a parametric test, we could find a p-value by a permutation test. As I introduced previously, permutation tests build a null sampling distribution by shuffling the relation between explanatory and response variables, and comparing our actual difference in means to this permuted distribution. One way to do this in R is with the pipeline in the infer package:
library(infer)raw_diff <- SR_visits %>%specify(visits ~ petal_color) |># response ~ explanatorycalculate(stat ="diff in means", # difference in group meansorder =c("pink", "white"))# PERMUTEperm_SR_visits <- SR_visits %>%specify(visits ~ petal_color) |># specify response ~ explanatoryhypothesize(null ="independence") |>generate(reps =1000000, type ="permute") |>calculate(stat ="diff in means", order =c("pink", "white"))perm_SR_visits |>get_p_value(obs_stat =pull(raw_diff) , direction ="two-sided")
tibble(p_value =0.00006)
# A tibble: 1 × 1
p_value
<dbl>
1 0.00006
Here, our p-value of 0.00006 is a bit smaller than that estimated from our t-test. But this would not meaningfully change any of our conclusions.
Writing up results
Now we can write up our results. Note this takes some thinking because we had to make some decisions. Here’s my attempt.
Note that I chose to report values from Welch’s t-test (which does not assume equal variance) on untransformed data. To me, this strikes a balance between interpretability and statistical correctness. Although the result is essentially the same as above, the conservatism of Welch’s t-test may help build readers’ trust.
At the Sawmill Road site, pink-flowered Clarkia xantiana ssp. parviflora RILs received, on average, more pollinators during a 15-minute observation than white-flowered RILs (mean pink = 1.76, mean white = 0.73). The mean difference of 1.03 visits was statistically significant (Welch’s t = 4.05, df = 89.6, p = 0.00016) and associated with a moderate-to-large effect size (Cohen’s D = 0.75). Results were robust to the right-skew in the data – bootstrap confidence intervals closely matched analytic ones, and a permutation-based p-value was similarly much smaller than 0.0001. We reject the null and conclude that pink-flowered plants attract more pollinators than white-flowered plants at this site.