Chapter 12: Chi-Square Tests & Inference for Regression
Learning Objectives
- Carry out a chi-square goodness-of-fit test and interpret the result in context
- Construct and interpret a two-way table; compute expected counts
- Perform chi-square tests for independence and homogeneity; identify when each applies
- Distinguish between a test for independence and a test for homogeneity
- Conduct a $t$-test for the slope of a regression line and construct a CI for $\beta$
- Interpret computer regression output to extract $b$, $SE_b$, $t$, and $p$-value
12.1 Chi-Square Goodness-of-Fit Test
The goodness-of-fit test asks: does a single categorical variable follow a specified distribution? It compares the observed counts from a sample against the expected counts from the hypothesized distribution.
Hypotheses and Test Statistic
$H_0$: The population distribution matches the specified distribution (the stated proportions are correct).
$H_a$: The population distribution does not match — at least one proportion differs from the specified value.
$$\chi^2 = \sum \frac{(O - E)^2}{E}$$
- $O$ = observed count; $E$ = expected count = $n \cdot p_i$
- $df = k - 1$ where $k$ = number of categories
- $\chi^2$ is always $\geq 0$; large values give small $p$-values
- Always right-tailed: $p\text{-value} = P(\chi^2 > \chi^2_{\text{stat}})$
Conditions
- Random: Data come from a random sample or randomized experiment.
- Large Counts: All expected counts are at least 5 (not the observed counts).
Example 12.1 — Is a Die Fair?
A die is rolled 60 times. Expected count for each face: $E = 60/6 = 10$. Observed counts:
| Face | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Observed | 8 | 11 | 9 | 13 | 7 | 12 |
| Expected | 10 | 10 | 10 | 10 | 10 | 10 |
Conditions: Random ✓; all expected counts = 10 $\geq$ 5 ✓.
$$\chi^2 = \frac{(8-10)^2}{10} + \frac{(11-10)^2}{10} + \frac{(9-10)^2}{10} + \frac{(13-10)^2}{10} + \frac{(7-10)^2}{10} + \frac{(12-10)^2}{10}$$
$$= \frac{4}{10} + \frac{1}{10} + \frac{1}{10} + \frac{9}{10} + \frac{9}{10} + \frac{4}{10} = 0.4 + 0.1 + 0.1 + 0.9 + 0.9 + 0.4 = 2.8$$
$df = 6 - 1 = 5$. From the $\chi^2$-table: $P(\chi^2 > 2.8) \approx 0.73$.
Conclusion: Since $p\text{-value} = 0.73 \gg 0.05$, we fail to reject $H_0$. There is not convincing evidence that the die is unfair — the results are consistent with a fair die.
A chi-square goodness-of-fit test yields $\chi^2 = 15.2$, $df = 4$, $p\text{-value} = 0.004$ at $\alpha = 0.05$. Interpret this result.
Show Answer
Chi-square distributions for $df = 1$ (blue), $df = 5$ (green), and $df = 10$ (purple). All are right-skewed; as $df$ increases, the distribution shifts right and becomes less skewed.
Figure 12.1 — Chi-Square Distributions for Various Degrees of Freedom
12.2 Chi-Square Test for Independence
The test for independence asks: are two categorical variables associated in a single population? Data are arranged in a two-way (contingency) table.
Expected Counts and Test Statistic
$$E_{ij} = \frac{(\text{row } i \text{ total}) \times (\text{column } j \text{ total})}{\text{table total}}$$
$$\chi^2 = \sum_{\text{all cells}} \frac{(O - E)^2}{E}, \qquad df = (r-1)(c-1)$$
where $r$ = number of rows and $c$ = number of columns.
$H_0$: The two variables are independent (no association).
$H_a$: The two variables are associated.
Example 12.2 — Gender and Subject Preference
200 students are surveyed on gender (Male/Female) and preferred subject (Math/English).
| Math | English | Total | |
|---|---|---|---|
| Male | 60 | 40 | 100 |
| Female | 45 | 55 | 100 |
| Total | 105 | 95 | 200 |
Expected counts (using $E = \text{row total} \times \text{col total} / \text{grand total}$):
- Male–Math: $(100 \times 105)/200 = 52.5$
- Male–English: $(100 \times 95)/200 = 47.5$
- Female–Math: $(100 \times 105)/200 = 52.5$
- Female–English: $(100 \times 95)/200 = 47.5$
All expected counts $\geq 5$ ✓.
$$\chi^2 = \frac{(60-52.5)^2}{52.5} + \frac{(40-47.5)^2}{47.5} + \frac{(45-52.5)^2}{52.5} + \frac{(55-47.5)^2}{47.5}$$
$$= \frac{56.25}{52.5} + \frac{56.25}{47.5} + \frac{56.25}{52.5} + \frac{56.25}{47.5} = 1.071 + 1.184 + 1.071 + 1.184 = 4.51$$
$df = (2-1)(2-1) = 1$. $p\text{-value} = P(\chi^2 > 4.51) \approx 0.034$.
Conclusion: Since $0.034 < 0.05$, we reject $H_0$. There is convincing evidence of an association between gender and preferred subject.
Example 12.3 — Verifying Conditions
For a test of independence to be valid:
- Random: Data come from a random sample of one population.
- Large Counts: All expected counts $\geq 5$ (check every cell of the expected table).
- Independent observations: Each individual appears in only one cell.
If any expected count is below 5, combine categories (if meaningful) or collect more data.
A test for independence has $df = (3-1)(4-1) = 6$. How many rows and columns does the table have?
Show Answer
Two-way table visualization: each square's shade represents the magnitude of $(O-E)^2/E$ contribution. Darker squares contribute more to the $\chi^2$ statistic, indicating where the association is strongest.
Figure 12.2 — Two-Way Table Residual Contributions
12.3 Chi-Square Test for Homogeneity
The test for homogeneity is structurally identical to the test for independence — same formula, same $df$, same conditions — but tests a different question: are the distributions of one categorical variable the same across multiple populations?
Independence vs. Homogeneity: Key Distinction
- Test for Independence: One sample from one population; ask whether two categorical variables are related within that population.
- Test for Homogeneity: Separate independent samples from multiple populations; ask whether a single categorical variable has the same distribution across all populations.
How to tell them apart: look at how data were collected. If you took one random sample and measured two variables → independence. If you took separate samples from multiple groups → homogeneity.
Example 12.4 — Lunch Preferences Across Three Schools
Researchers independently sampled students from three schools (A, B, C) and recorded preferred lunch option (pizza, salad, sandwich). Observed counts:
| Pizza | Salad | Sandwich | Total | |
|---|---|---|---|---|
| School A | 30 | 15 | 25 | 70 |
| School B | 25 | 20 | 15 | 60 |
| School C | 20 | 25 | 25 | 70 |
| Total | 75 | 60 | 65 | 200 |
State: $H_0$: The distribution of lunch preference is the same in all three schools. $H_a$: At least one school has a different distribution.
$df = (3-1)(3-1) = 4$. Compute expected counts, verify all $\geq 5$, calculate $\chi^2$, find $p$-value from chi-square table.
Expected for School A–Pizza: $(70 \times 75)/200 = 26.25$. (All other expected counts computed similarly.)
AP Exam Tip: The chi-square formula $\chi^2 = \sum(O-E)^2/E$ and the procedure are identical for both independence and homogeneity. The only difference is the conclusion language: for independence, say "evidence of an association between [variable A] and [variable B]." For homogeneity, say "evidence that the distribution of [variable] differs across [populations]."
12.4 Inference for Regression (Slope)
The LSRL computed from sample data, $\hat{y} = a + bx$, is an estimate of the true population regression line $\mu_y = \alpha + \beta x$. We can test whether a linear relationship actually exists in the population by testing whether the true slope $\beta$ is zero.
Hypothesis Test for Slope $\beta$
$H_0$: $\beta = 0$ (no linear relationship between $x$ and $y$ in the population)
$H_a$: $\beta \neq 0$ (a linear relationship exists; two-sided is standard)
Test statistic: $t = \dfrac{b}{SE_b}$, where $SE_b$ is the standard error of the sample slope.
Degrees of freedom: $df = n - 2$.
Confidence interval for $\beta$: $b \pm t^* \cdot SE_b$
The value $SE_b$ is almost always given in computer output — you are not expected to compute it by hand on the AP exam.
Reading Computer Regression Output
AP exam problems often give a computer output table. The key values to extract are:
- Coef (b): The sample slope (estimate of $\beta$)
- SE Coef ($SE_b$): Standard error of the slope
- T-Value or t-stat: $t = b / SE_b$
- P-Value: Two-sided $p$-value for testing $H_0: \beta = 0$
Example 12.5 — Significance Test for Slope
A study of 20 students examines the relationship between study hours ($x$) and exam score ($y$). Computer output gives: $b = 2.45$, $SE_b = 0.38$. Test $H_0: \beta = 0$ vs. $H_a: \beta \neq 0$ at $\alpha = 0.05$.
Conditions: Linear form (check residual plot) ✓; Random sample ✓; Normal/Equal spread of residuals ✓.
$df = 20 - 2 = 18$.
$$t = \frac{b}{SE_b} = \frac{2.45}{0.38} = 6.45$$
$p\text{-value} = 2P(t > 6.45) \approx 0.00004$ (extremely small).
Conclusion: Since $p\text{-value} \approx 0.0001 \ll 0.05$, we reject $H_0$. There is very strong evidence of a positive linear relationship between study hours and exam score.
Example 12.6 — Confidence Interval for Slope $\beta$
Using the same data: $b = 2.45$, $SE_b = 0.38$, $df = 18$, $t^* = 2.101$ (for 95% CI).
$$\beta \in b \pm t^* \cdot SE_b = 2.45 \pm 2.101 \cdot 0.38 = 2.45 \pm 0.798 = (1.65,\ 3.25)$$
Interpretation: We are 95% confident that for each additional hour of study, the true mean exam score increases by between 1.65 and 3.25 points.
A regression test yields $t = 4.2$, $df = 28$, $p\text{-value} = 0.0002$. The null hypothesis is $H_0: \beta = 0$. What does rejecting $H_0$ mean in context?
Show Answer
Regression scatterplot with least-squares line and approximate 95% confidence band for the slope. Each data point represents a student's study hours and exam score. The band widens at extreme $x$-values.
Figure 12.3 — Regression Line with Confidence Band for Slope
Practice Problems
A sample of 80 people is classified by blood type: A=28, B=20, AB=8, O=24. The U.S. distribution is A=40%, B=11%, AB=4%, O=45%. Test whether this sample matches the U.S. distribution at $\alpha=0.05$.
Show Solution
In a $3 \times 4$ two-way table, what are the degrees of freedom for a chi-square test?
Show Solution
Explain why a chi-square test statistic is always non-negative and why the $p$-value always comes from the right tail.
Show Solution
A regression of height ($x$, inches) on weight ($y$, lbs) for $n=30$ gives $b=4.8$, $SE_b=1.2$. Construct a 95% CI for $\beta$. ($df=28$, $t^*=2.048$)
Show Solution
CI: $4.8 \pm 2.458 = (2.34,\ 7.26)$.
We are 95% confident that for each additional inch of height, the true mean weight increases by between 2.34 and 7.26 pounds.
Two schools independently survey students about smartphone ownership (yes/no). School 1: $n=100$, 72 own one. School 2: $n=80$, 52 own one. Should you use a test for independence or homogeneity? Explain.
Show Solution
Computer output shows: Predictor = temperature, Coef = $-0.82$, SE Coef = $0.19$, T = $-4.32$, P = $0.001$. Interpret the slope and the test result ($\alpha=0.05$).
Show Solution
A goodness-of-fit test has $\chi^2 = 3.2$, $df = 5$. The $p$-value is approximately 0.67. State the conclusion and explain what this $p$-value tells you about the observed vs. expected counts.
Show Solution
AP FRQ — Four-Step Chi-Square Independence Test: 150 students are classified by grade level (9th/10th/11th) and participation in extracurriculars (yes/no). Observed: 9th: 30 yes, 20 no; 10th: 25 yes, 25 no; 11th: 35 yes, 15 no. Test for independence at $\alpha=0.05$.
Show Solution
Plan: Chi-square test for independence. Random ✓. Totals: Yes=90, No=60, 9th=50, 10th=50, 11th=50, Total=150. Expected: E(9th,Yes)=50(90)/150=30; E(9th,No)=20; E(10th,Yes)=30; E(10th,No)=20; E(11th,Yes)=30; E(11th,No)=20. All ≥5 ✓.
Do: $\chi^2=(30-30)^2/30+(20-20)^2/20+(25-30)^2/30+(25-20)^2/20+(35-30)^2/30+(15-20)^2/20=0+0+0.833+1.25+0.833+1.25=4.17$. $df=(3-1)(2-1)=2$. $p\text{-value}=P(\chi^2>4.17)\approx0.124$.
Conclude: Since $0.124>0.05$, fail to reject $H_0$. There is not convincing evidence of an association between grade level and extracurricular participation.
📋 Chapter Summary
Chi-Square Tests
$\chi^2 = \displaystyle\sum \dfrac{(O - E)^2}{E}$ — measures how far observed counts are from expected counts. Larger values suggest more evidence against $H_0$.
Tests whether a single categorical variable has a specified distribution. $df = k - 1$ where $k$ = number of categories.
Tests whether two categorical variables are independent (using a two-way table). $df = (r-1)(c-1)$.
Tests whether several populations have the same distribution of a categorical variable. Same formula as independence test; different setup.
Conditions
- Random — data from a random sample or randomized experiment
- Large counts — all expected counts $E \geq 5$
- Independent observations — 10% condition: $n \leq 10\%$ of population