admix_anova <- aov(admix_proportion ~ petal_color, data= clarkia_hz )• 16. F and ANOVA in R
Motivating Scenario: Now that you know what goes into calculating the sums of squares, degrees of freedom, mean squares, and F, you could do it yourself. Or you could save time and effort by asking R to do this for you. Here, we run an ANOVA in R and learn how to read the results it gives back.
Learning Goals: By the end of this subchapter, you should be able to:
In the previous sections we worked through the math and the theory, and even how we could use R to do our calculations for us. This is all useful and I hope it helps you understand what is going on behind the scenes. But now that we know what’s going on, we can have R do our work for us. Here, I show you two ways to have R calculate the F statistic, and conduct an ANOVA test for us.
In so doing, we consider how to interpret R’s output, and how to convert it to a “tidy” format with broom’s glance() and tidy() functions.
The aov() |> summary() pipeline
The aov() function is built specifically to conduct ANOVA in R. To do so, it uses the familiar formula syntax: (RESPONSE ~ EXPLANATORY, data = DATA)

The aov() object contains the relevant sums of squares (which match our hand calculations!) and the degrees of freedom. To actually view the ANOVA table, we use summary() on the aov object. This table includes the sums of squares, mean squares, the F statistic, and the associated p-value.
summary(admix_anova) Df Sum Sq Mean Sq F value Pr(>F)
petal_color 1 0.001831 0.0018312 76.54 3.49e-11 ***
Residuals 44 0.001053 0.0000239
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
27 observations deleted due to missingness
broom’s tidy() function provides this output in a “tidy” format, its glance() function shows some relevant summaries of the model (including \(R^2\)).
library(broom)
tidy(admix_anova)# A tibble: 2 × 6
term df sumsq meansq statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 petal_color 1 0.00183 0.00183 76.5 3.49e-11
2 Residuals 44 0.00105 0.0000239 NA NA
glance(admix_anova)# A tibble: 1 × 6
logLik AIC BIC deviance nobs r.squared
<dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 180. -355. -349. 0.00105 46 0.635
The lm() |> anova() pipeline
Alternatively, we can fit a linear model and then hand it to anova() to create the ANOVA table. This produces the same output as aov() |> summary(), but the intermediate object is an lm object rather than an aov object. Sometimes one format is just easier to work with than the other, but your results will be the same.
lm(admix_proportion ~ petal_color, data= clarkia_hz ) |>
anova()Analysis of Variance Table
Response: admix_proportion
Df Sum Sq Mean Sq F value Pr(>F)
petal_color 1 0.0018312 0.00183122 76.539 3.486e-11 ***
Residuals 44 0.0010527 0.00002393
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If we want more information about our model, we can pass the lm object to glance():
lm(admix_proportion ~ petal_color, data= clarkia_hz ) |>
glance()# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.635 0.627 0.00489 76.5 3.49e-11 1 180. -355. -349.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
Here is a brief summary of common linear model workflows in R and their outputs.
| Task | Function(s) | Output object |
|---|---|---|
| Run ANOVA directly | aov() |> summary() |
ANOVA table |
| Run via linear model | lm() |> anova() |
same ANOVA table, but model object is lm |
| Tidy results | broom::tidy() |
tidy table of terms |
| Model summary | broom::glance() |
includes R², df, etc. |