• 17. Significance groups

Motivating Scenario: You have conducted your post-hoc tests. So, for each pair, you now know if the difference in means is statistically significant. Now, you want to effectively summarize all these pairwise differences. Our goal here is to simplify a complex table of pairwise comparisons into a compact, easy-to-read, interpretable summary.

Learning Goals: By the end of this subchapter, you should be able to:

Derive and label significance groups (“a”, “b”, etc.) from pairwise comparison results.
Explain the meaning of overlapping letters (e.g., “a,b”) in significance group displays.
Visualize significance groups on plots to clearly communicate post-hoc results with ggplot.

comparison	p.adj	significant
S22-SR	0.0000	TRUE
SM-SR	0.3378	FALSE
S6-SR	0.0000	TRUE
SM-S22	0.0000	TRUE
S6-S22	0.4859	FALSE
S6-SM	0.0000	TRUE

So we now know which pairs differ from one another! But, there is a real mental burden in trying to make sense of results from many pairwise tests. Defining “significance groups” helps us present the results of a post-hoc test. Here, assign the same letter to groups that do not differ significantly, and different letters to groups that do. In our case:

Sites SR and S22 differ significantly from one another, so we put them in different significance groups.
- We’ll assign SR to group “a”.
- We’ll assign S22 to group “b”.
Site SR does not differ significantly from site SM, but does differ significantly from Site S22.
- We’ll assign SM to group “a”.
Site S6 differs significantly from sites SR and SM, but does not differ significantly from Site S22.
- We’ll assign S6 to group “b”.

These letter-based summaries are handy for communicating complex post-hoc results. But remember, they are shorthand for results of your post-hoc tests, not new analyses. We can communicate these groups by adding this information to a plot:

library(ggforce)
hz_summary <-clarkia_hz |>
  group_by(site)|>
  summarise(mean_admix_proportion = mean(admix_proportion ),
            mean_plus_two_sd = mean_admix_proportion + 2*sd(admix_proportion))|>
  mutate(sig_group = case_when(site %in% c("SR","SM")~"a",
                               site %in% c("S22","S6")~"b"))

clarkia_hz |>
  ggplot(aes(x = site, y = admix_proportion , color = site))+
  geom_boxplot(outliers = FALSE)+
  geom_sina(size =5, alpha = .64)+
  geom_text(data = hz_summary, aes(y = .05, label = sig_group),
            size = 8, position = position_nudge(x = .3))+
  theme(legend.position = "none")

Tricky cases

comparison	significant
X-Y	TRUE
X-Z	FALSE
Y-Z	FALSE

Consider the hypothetical (and not uncommon) outcome of a post-hoc test, displayed in the right margin. Here:

Groups X and Y significantly differ. So we put them in different significance groups.
- We’ll assign X to group “a”.
- We’ll assign Y to group “b”.
But Z differs from neither group. So,
- We’ll assign Z to groups “a” and “b” because it does not significantly differ from either group. In a plot we would show this as “a,b”.

When you see a category with two letters (like “a, b”), it means the sample is not significantly different from categories in group a and in group b, but all categories uniquely assigned to a are significantly different from all categories uniquely assigned to group b.

R automation

As you might imagine, logic-ing this all out can get messy. Luckily, you can pipe your glht() output into the cld() function which will assign categories to “significance groups”:

library(multcomp)
lm(admix_proportion ~ site, data = clarkia_hz)|>
    glht(linfct = mcp(site = "Tukey"))|>
    cld()

 SR S22  SM  S6 
"a" "b" "a" "b"