Motivating Scenario:
You are beginning to think about statistical models, and want to use something you understand well to ramp you up to more complex models.
Learning Goals: By the end of this subchapter, you should be able to:
Understand the mean as a linear model.
Recognize that modeling a variable with just its mean is fitting a simple linear model with no predictors.
Interpret the output of this simple, mean-only lm().
The mean again?
“But we already had a section on the mean, and besides I’ve known what a mean was for years. Why another section on this?”
You, probably.
We are beginning our tour of interpreting linear models with the mean. We start with the mean, not because I doubt that you understand what a mean ais. I know that you know how to calculate the mean as \(\overline{y} = \frac{\sum y_i}{n}\). Instead, we are starting here because your solid understanding of these concepts will help you better understand linear models.
In a simple linear model with no predictors, the intercept is the mean and the only other term is the residual variation (see next section on residuals). So we predict the \(i^{th}\) individual’s value of the response variable to be:
\[\hat{y}_i = b_0\]
Where \(b_0\) is the intercept (i.e. the sample mean). This means that the model predicts the same value for every observation: the mean.
In R you build linear models with the lm() syntax: lm(response ~ explanatory1 + explanatory2 + ..., data = data_set). In a simple model with no predictors you type: lm(response ~ 1, data = data_set).
So to model the proportion of hybrid seed in GC with no explanatory variables, type:
Figure 1: A bunch of Clarkia seeds. How many do you think are hybrids?
The output gives us the estimated intercept — which, in this case with no predictors, is simply the mean (see above). The code below verifies this (except that R provides a different number of digits).