A cartoon on correlation from xkcd. The original rollover text says: “Correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing look over there”. See this link for a more detailed explanation.
Associations reveal how variables relate to one another - e.g. if they tend to increase together, differ across groups, or cluster. Differences in conditional means (or proportions) describe how a numeric (or categorical) response variable varies across levels of a categorical explanatory variable. While these summaries can highlight patterns, interpretation requires care: strong associations don’t necessarily imply causation, and predictions may not hold across contexts or datasets.
Chatbot tutor
Please interact with this custom chatbot (For ChatGPT, or Gemini) I have made to help you with this chapter. I suggest interacting with at least ten back-and-forths to ramp up and then stopping when you feel like you got what you needed from it.
Practice Questions
Try these questions! By using the R environment you can work without leaving this “book”. To help you jump right into thinking and analysis, I have formatted the titanic data and subset it to only be adult males in “1st” class, or “Crew”. I now call it titanic_males. If you’re curious to see what I did, expand the code below.
The set of questions below focuses on comparing the association between pollinator visitation (no_visits vs some_visits) to the association between petal color and proportion hybrid seed. Use the webR console above to work through these!
Q6) The difference in mean anther stigma distance conditional on being visited (some_visits - no_visits) is: .
Q7) According to traditional interpretations of Cohen’s D, this “effect” is:
group_by()([dplyr]): Groups data for grouped summaries like conditional proportions or means.
summarise()([dplyr]): Summarizes multiple rows into a single value, e.g., a mean, covariance, or correlation.
mean()([base R]): Computes means (or proportions). In this chapter we combine this with group_by() to find conditional means (or conditional proportions).
We often combine these below with the following chain of operations. data|>group_by()|>summarize(mean()).
Calling Bullshit has a fantastic set of videos on correlation and causation.
Correlation and Causation: “Correlations are often used to make claims about causation. Be careful about the direction in which causality goes. For example: do food stamps cause poverty?”
What are Correlations? :“Jevin providers an informal introduction to linear correlations.”
Correlation Exercise” “When is correlation all you need, and causation is beside the point? Can you figure out which way causality goes for each of several correlations?”
Common Causes: “We explain how common causes can generate correlations between otherwise unrelated variables, and look at the correlational evidence that storks bring babies. We look at the need to think about multiple contributing causes. The fallacy of post hoc propter ergo hoc: the mistaken belief that if two events happen sequentially, the first must have caused the second.”
Manipulative Experiments: “We look at how manipulative experiments can be used to work out the direction of causation in correlated variables, and sum up the questions one should ask when presented with a correlation.