Chapter 12: Chi-Square Tests & Inference for Regression

AP Statistics · Categorical Data & Regression Inference · 3 interactive graphs · 8 practice problems

Learning Objectives

Carry out a chi-square goodness-of-fit test and interpret the result in context
Construct and interpret a two-way table; compute expected counts
Perform chi-square tests for independence and homogeneity; identify when each applies
Distinguish between a test for independence and a test for homogeneity
Conduct a $t$-test for the slope of a regression line and construct a CI for $\beta$
Interpret computer regression output to extract $b$, $SE_b$, $t$, and $p$-value

12.1 Chi-Square Goodness-of-Fit Test

The goodness-of-fit test asks: does a single categorical variable follow a specified distribution? It compares the observed counts from a sample against the expected counts from the hypothesized distribution.

Hypotheses and Test Statistic

$H_0$: The population distribution matches the specified distribution (the stated proportions are correct).
$H_a$: The population distribution does not match — at least one proportion differs from the specified value.

$$\chi^2 = \sum \frac{(O - E)^2}{E}$$

$O$ = observed count; $E$ = expected count = $n \cdot p_i$
$df = k - 1$ where $k$ = number of categories
$\chi^2$ is always $\geq 0$; large values give small $p$-values
Always right-tailed: $p\text{-value} = P(\chi^2 > \chi^2_{\text{stat}})$

Conditions

Random: Data come from a random sample or randomized experiment.
Large Counts: All expected counts are at least 5 (not the observed counts).

Example 12.1 — Is a Die Fair?

A die is rolled 60 times. Expected count for each face: $E = 60/6 = 10$. Observed counts:

Face	1	2	3	4	5	6
Observed	8	11	9	13	7	12
Expected	10	10	10	10	10	10

Conditions: Random ✓; all expected counts = 10 $\geq$ 5 ✓.

$$\chi^2 = \frac{(8-10)^2}{10} + \frac{(11-10)^2}{10} + \frac{(9-10)^2}{10} + \frac{(13-10)^2}{10} + \frac{(7-10)^2}{10} + \frac{(12-10)^2}{10}$$

$$= \frac{4}{10} + \frac{1}{10} + \frac{1}{10} + \frac{9}{10} + \frac{9}{10} + \frac{4}{10} = 0.4 + 0.1 + 0.1 + 0.9 + 0.9 + 0.4 = 2.8$$

$df = 6 - 1 = 5$. From the $\chi^2$-table: $P(\chi^2 > 2.8) \approx 0.73$.

Conclusion: Since $p\text{-value} = 0.73 \gg 0.05$, we fail to reject $H_0$. There is not convincing evidence that the die is unfair — the results are consistent with a fair die.

TRY IT

A chi-square goodness-of-fit test yields $\chi^2 = 15.2$, $df = 4$, $p\text{-value} = 0.004$ at $\alpha = 0.05$. Interpret this result.

Show Answer

Since $p\text{-value} = 0.004 < 0.05 = \alpha$, we reject $H_0$. There is convincing evidence that the population distribution does not match the specified distribution — at least one of the proportions differs from the stated value.

Chi-square distributions for $df = 1$ (blue), $df = 5$ (green), and $df = 10$ (purple). All are right-skewed; as $df$ increases, the distribution shifts right and becomes less skewed.

Figure 12.1 — Chi-Square Distributions for Various Degrees of Freedom

12.2 Chi-Square Test for Independence

The test for independence asks: are two categorical variables associated in a single population? Data are arranged in a two-way (contingency) table.

Expected Counts and Test Statistic

$$E_{ij} = \frac{(\text{row } i \text{ total}) \times (\text{column } j \text{ total})}{\text{table total}}$$

$$\chi^2 = \sum_{\text{all cells}} \frac{(O - E)^2}{E}, \qquad df = (r-1)(c-1)$$

where $r$ = number of rows and $c$ = number of columns.

$H_0$: The two variables are independent (no association).
$H_a$: The two variables are associated.

Example 12.2 — Gender and Subject Preference

200 students are surveyed on gender (Male/Female) and preferred subject (Math/English).

	Math	English	Total
Male	60	40	100
Female	45	55	100
Total	105	95	200

Expected counts (using $E = \text{row total} \times \text{col total} / \text{grand total}$):

Male–Math: $(100 \times 105)/200 = 52.5$
Male–English: $(100 \times 95)/200 = 47.5$
Female–Math: $(100 \times 105)/200 = 52.5$
Female–English: $(100 \times 95)/200 = 47.5$

All expected counts $\geq 5$ ✓.

$$\chi^2 = \frac{(60-52.5)^2}{52.5} + \frac{(40-47.5)^2}{47.5} + \frac{(45-52.5)^2}{52.5} + \frac{(55-47.5)^2}{47.5}$$

$$= \frac{56.25}{52.5} + \frac{56.25}{47.5} + \frac{56.25}{52.5} + \frac{56.25}{47.5} = 1.071 + 1.184 + 1.071 + 1.184 = 4.51$$

$df = (2-1)(2-1) = 1$. $p\text{-value} = P(\chi^2 > 4.51) \approx 0.034$.

Conclusion: Since $0.034 < 0.05$, we reject $H_0$. There is convincing evidence of an association between gender and preferred subject.

Example 12.3 — Verifying Conditions

For a test of independence to be valid:

Random: Data come from a random sample of one population.
Large Counts: All expected counts $\geq 5$ (check every cell of the expected table).
Independent observations: Each individual appears in only one cell.

If any expected count is below 5, combine categories (if meaningful) or collect more data.

TRY IT

A test for independence has $df = (3-1)(4-1) = 6$. How many rows and columns does the table have?

Show Answer

From $df = (r-1)(c-1) = 6$, we have $(r-1) = 2$ and $(c-1) = 3$, so the table has 3 rows and 4 columns (or equivalently 4 rows and 3 columns — both give $df = 6$).

Two-way table visualization: each square's shade represents the magnitude of $(O-E)^2/E$ contribution. Darker squares contribute more to the $\chi^2$ statistic, indicating where the association is strongest.

Figure 12.2 — Two-Way Table Residual Contributions

12.3 Chi-Square Test for Homogeneity

The test for homogeneity is structurally identical to the test for independence — same formula, same $df$, same conditions — but tests a different question: are the distributions of one categorical variable the same across multiple populations?

Independence vs. Homogeneity: Key Distinction

Test for Independence: One sample from one population; ask whether two categorical variables are related within that population.
Test for Homogeneity: Separate independent samples from multiple populations; ask whether a single categorical variable has the same distribution across all populations.

How to tell them apart: look at how data were collected. If you took one random sample and measured two variables → independence. If you took separate samples from multiple groups → homogeneity.

Example 12.4 — Lunch Preferences Across Three Schools

Researchers independently sampled students from three schools (A, B, C) and recorded preferred lunch option (pizza, salad, sandwich). Observed counts:

	Pizza	Salad	Sandwich	Total
School A	30	15	25	70
School B	25	20	15	60
School C	20	25	25	70
Total	75	60	65	200

State: $H_0$: The distribution of lunch preference is the same in all three schools. $H_a$: At least one school has a different distribution.

$df = (3-1)(3-1) = 4$. Compute expected counts, verify all $\geq 5$, calculate $\chi^2$, find $p$-value from chi-square table.

Expected for School A–Pizza: $(70 \times 75)/200 = 26.25$. (All other expected counts computed similarly.)

★

AP Exam Tip: The chi-square formula $\chi^2 = \sum(O-E)^2/E$ and the procedure are identical for both independence and homogeneity. The only difference is the conclusion language: for independence, say "evidence of an association between [variable A] and [variable B]." For homogeneity, say "evidence that the distribution of [variable] differs across [populations]."

12.4 Inference for Regression (Slope)

The LSRL computed from sample data, $\hat{y} = a + bx$, is an estimate of the true population regression line $\mu_y = \alpha + \beta x$. We can test whether a linear relationship actually exists in the population by testing whether the true slope $\beta$ is zero.

Hypothesis Test for Slope $\beta$

$H_0$: $\beta = 0$ (no linear relationship between $x$ and $y$ in the population)

$H_a$: $\beta \neq 0$ (a linear relationship exists; two-sided is standard)

Test statistic: $t = \dfrac{b}{SE_b}$, where $SE_b$ is the standard error of the sample slope.

Degrees of freedom: $df = n - 2$.

Confidence interval for $\beta$: $b \pm t^* \cdot SE_b$

The value $SE_b$ is almost always given in computer output — you are not expected to compute it by hand on the AP exam.

Reading Computer Regression Output

AP exam problems often give a computer output table. The key values to extract are:

Coef (b): The sample slope (estimate of $\beta$)
SE Coef ($SE_b$): Standard error of the slope
T-Value or t-stat: $t = b / SE_b$
P-Value: Two-sided $p$-value for testing $H_0: \beta = 0$

Example 12.5 — Significance Test for Slope

A study of 20 students examines the relationship between study hours ($x$) and exam score ($y$). Computer output gives: $b = 2.45$, $SE_b = 0.38$. Test $H_0: \beta = 0$ vs. $H_a: \beta \neq 0$ at $\alpha = 0.05$.

Conditions: Linear form (check residual plot) ✓; Random sample ✓; Normal/Equal spread of residuals ✓.

$df = 20 - 2 = 18$.

$$t = \frac{b}{SE_b} = \frac{2.45}{0.38} = 6.45$$

$p\text{-value} = 2P(t > 6.45) \approx 0.00004$ (extremely small).

Conclusion: Since $p\text{-value} \approx 0.0001 \ll 0.05$, we reject $H_0$. There is very strong evidence of a positive linear relationship between study hours and exam score.

Example 12.6 — Confidence Interval for Slope $\beta$

Using the same data: $b = 2.45$, $SE_b = 0.38$, $df = 18$, $t^* = 2.101$ (for 95% CI).

$$\beta \in b \pm t^* \cdot SE_b = 2.45 \pm 2.101 \cdot 0.38 = 2.45 \pm 0.798 = (1.65,\ 3.25)$$

Interpretation: We are 95% confident that for each additional hour of study, the true mean exam score increases by between 1.65 and 3.25 points.

TRY IT

A regression test yields $t = 4.2$, $df = 28$, $p\text{-value} = 0.0002$. The null hypothesis is $H_0: \beta = 0$. What does rejecting $H_0$ mean in context?

Show Answer

Rejecting $H_0: \beta = 0$ means there is convincing evidence of a linear relationship between the explanatory and response variables in the population. The slope of the true regression line is not zero — knowing $x$ provides useful information for predicting $y$.

Regression scatterplot with least-squares line and approximate 95% confidence band for the slope. Each data point represents a student's study hours and exam score. The band widens at extreme $x$-values.

Figure 12.3 — Regression Line with Confidence Band for Slope

Practice Problems

A sample of 80 people is classified by blood type: A=28, B=20, AB=8, O=24. The U.S. distribution is A=40%, B=11%, AB=4%, O=45%. Test whether this sample matches the U.S. distribution at $\alpha=0.05$.

Show Solution

Expected: A=32, B=8.8, AB=3.2, O=36. Expected AB=3.2 < 5 — condition fails! Consider combining AB with another category or note that the large counts condition is not fully met. If proceeding: $\chi^2=(28-32)^2/32+(20-8.8)^2/8.8+(8-3.2)^2/3.2+(24-36)^2/36=0.5+14.2+7.2+4.0=25.9$. $df=3$. $p\text{-value}\approx0.000$. Reject $H_0$ (but note condition violation for AB category).

In a $3 \times 4$ two-way table, what are the degrees of freedom for a chi-square test?

Show Solution

$df = (r-1)(c-1) = (3-1)(4-1) = 2 \times 3 = 6$.

Explain why a chi-square test statistic is always non-negative and why the $p$-value always comes from the right tail.

Show Solution

$\chi^2 = \sum(O-E)^2/E$ is a sum of squared quantities divided by positive expected counts — squares are always $\geq 0$. Large $\chi^2$ values indicate large discrepancies between observed and expected. We use the right tail because larger $\chi^2$ means more evidence against $H_0$; the left tail would correspond to "too close to expected," which is not evidence against the null.

A regression of height ($x$, inches) on weight ($y$, lbs) for $n=30$ gives $b=4.8$, $SE_b=1.2$. Construct a 95% CI for $\beta$. ($df=28$, $t^*=2.048$)

Show Solution

ME $= 2.048 \cdot 1.2 = 2.458$.
CI: $4.8 \pm 2.458 = (2.34,\ 7.26)$.
We are 95% confident that for each additional inch of height, the true mean weight increases by between 2.34 and 7.26 pounds.

Two schools independently survey students about smartphone ownership (yes/no). School 1: $n=100$, 72 own one. School 2: $n=80$, 52 own one. Should you use a test for independence or homogeneity? Explain.

Show Solution

Homogeneity. Two independent samples were drawn from two separate populations (schools). The question is whether the distribution of smartphone ownership is the same across both schools. If instead a single sample was surveyed and both school and ownership recorded, independence would apply.

Computer output shows: Predictor = temperature, Coef = $-0.82$, SE Coef = $0.19$, T = $-4.32$, P = $0.001$. Interpret the slope and the test result ($\alpha=0.05$).

Show Solution

The estimated slope $b = -0.82$ means that for each 1-degree increase in temperature, the predicted response decreases by 0.82 units on average. Since $p = 0.001 < 0.05$, we reject $H_0: \beta = 0$. There is convincing evidence of a negative linear relationship between temperature and the response variable.

A goodness-of-fit test has $\chi^2 = 3.2$, $df = 5$. The $p$-value is approximately 0.67. State the conclusion and explain what this $p$-value tells you about the observed vs. expected counts.

Show Solution

Fail to reject $H_0$. There is not convincing evidence that the population distribution differs from the specified distribution. A $p$-value of 0.67 means that if the null model were true, we would get a $\chi^2$ as large as 3.2 about 67% of the time — the observed counts are quite close to what the model predicts.

AP FRQ — Four-Step Chi-Square Independence Test: 150 students are classified by grade level (9th/10th/11th) and participation in extracurriculars (yes/no). Observed: 9th: 30 yes, 20 no; 10th: 25 yes, 25 no; 11th: 35 yes, 15 no. Test for independence at $\alpha=0.05$.

Show Solution

State: $H_0$: Grade level and extracurricular participation are independent. $H_a$: There is an association between grade level and participation. $\alpha=0.05$.
Plan: Chi-square test for independence. Random ✓. Totals: Yes=90, No=60, 9th=50, 10th=50, 11th=50, Total=150. Expected: E(9th,Yes)=50(90)/150=30; E(9th,No)=20; E(10th,Yes)=30; E(10th,No)=20; E(11th,Yes)=30; E(11th,No)=20. All ≥5 ✓.
Do: $\chi^2=(30-30)^2/30+(20-20)^2/20+(25-30)^2/30+(25-20)^2/20+(35-30)^2/30+(15-20)^2/20=0+0+0.833+1.25+0.833+1.25=4.17$. $df=(3-1)(2-1)=2$. $p\text{-value}=P(\chi^2>4.17)\approx0.124$.
Conclude: Since $0.124>0.05$, fail to reject $H_0$. There is not convincing evidence of an association between grade level and extracurricular participation.

📋 Chapter Summary

Chi-Square Tests

$\chi^2$ Statistic

$\chi^2 = \displaystyle\sum \dfrac{(O - E)^2}{E}$ — measures how far observed counts are from expected counts. Larger values suggest more evidence against $H_0$.

Goodness-of-Fit Test

Tests whether a single categorical variable has a specified distribution. $df = k - 1$ where $k$ = number of categories.

Test for Independence

Tests whether two categorical variables are independent (using a two-way table). $df = (r-1)(c-1)$.

Test for Homogeneity

Tests whether several populations have the same distribution of a categorical variable. Same formula as independence test; different setup.

Conditions

Random — data from a random sample or randomized experiment
Large counts — all expected counts $E \geq 5$
Independent observations — 10% condition: $n \leq 10\%$ of population

📘 Key Terms

Chi-Square Statistic$\chi^2 = \sum(O-E)^2/E$ — measures discrepancy between observed and expected counts.

Expected CountUnder $H_0$: $E = np$ (GOF) or $E = \text{row total} \times \text{col total}/n$ (two-way table).

Goodness-of-FitTests if one categorical variable matches a claimed distribution. $H_0$: distribution is as claimed.

Independence TestUses two-way table to test if two categorical variables are independent. $H_0$: variables are independent.

Degrees of FreedomGOF: $df = k-1$. Two-way: $df = (r-1)(c-1)$. Determines the shape of the $\chi^2$ distribution.

Homogeneity TestTests if several populations share the same categorical distribution. Differs from independence in study design.

← Chapter 11: Significance Tests AP Statistics Exam Guide →