Chapter 11: Significance Tests
Learning Objectives
- State hypotheses $H_0$ and $H_a$ correctly in the context of a problem
- Interpret a $p$-value in context and make a conclusion at a given $\alpha$
- Distinguish Type I error ($\alpha$) from Type II error ($\beta$) and power ($1-\beta$)
- Carry out a one-proportion $z$-test with conditions verified
- Carry out a one-sample $t$-test and connect it to confidence intervals
- Apply two-sample $z$- and $t$-tests and recognize when to use a pooled proportion
11.1 The Logic of Significance Testing
A significance test uses sample data to evaluate a claim about a population parameter. The four-step framework used on the AP exam is: State → Plan → Do → Conclude.
The Four-Step Process
- State: Identify the parameter in context. Write $H_0$ (null hypothesis) and $H_a$ (alternative hypothesis) using proper notation.
- Plan: Choose the appropriate test. Verify conditions (Random, Normal/Large Counts, 10%).
- Do: Calculate the test statistic and find the $p$-value.
- Conclude: Compare $p$-value to $\alpha$. State conclusion in context — reject or fail to reject $H_0$.
Hypotheses
The null hypothesis $H_0$ represents the claim of no effect, no difference, or a specific parameter value (e.g., $H_0: p = 0.30$). The alternative hypothesis $H_a$ represents what we are trying to find evidence for. Alternatives are:
- One-sided (left): $H_a: p < p_0$
- One-sided (right): $H_a: p > p_0$
- Two-sided: $H_a: p \neq p_0$ (most conservative; use when direction is unknown)
p-Value and Significance Level
The $p$-value is the probability of obtaining a test statistic as extreme as (or more extreme than) the observed value, assuming $H_0$ is true. A small $p$-value means the observed data would be unlikely if $H_0$ were true, giving evidence against $H_0$.
The significance level $\alpha$ is the threshold. We reject $H_0$ when $p\text{-value} < \alpha$.
Type I and Type II Errors
| $H_0$ is actually TRUE | $H_0$ is actually FALSE | |
|---|---|---|
| Reject $H_0$ | Type I Error (probability = $\alpha$) | Correct decision (Power = $1-\beta$) |
| Fail to reject $H_0$ | Correct decision | Type II Error (probability = $\beta$) |
- Type I error ($\alpha$): Reject a true $H_0$ — "false positive." Probability = $\alpha$.
- Type II error ($\beta$): Fail to reject a false $H_0$ — "false negative." Probability = $\beta$.
- Power = $1 - \beta$: Probability of correctly rejecting a false $H_0$. Increased by larger $n$, larger effect size, or larger $\alpha$.
Example 11.1 — One-Sample $t$-Test (Right-Sided)
A quality-control team believes a machine produces parts with mean diameter $\mu > 50$ mm. A random sample of $n = 36$ gives $\bar{x} = 53.2$ mm, $s = 9$ mm.
State: $H_0: \mu = 50$; $H_a: \mu > 50$ (one-sided right)
Plan: One-sample $t$-test. Conditions: Random ✓; $n=36\geq30$ ✓; 10% ✓.
Do:
$$t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}} = \frac{53.2 - 50}{9/\sqrt{36}} = \frac{3.2}{1.5} = 2.13 \quad (df = 35)$$
From the $t$-table: $p\text{-value} \approx 0.020$.
Conclude: Since $0.020 < 0.05 = \alpha$, we reject $H_0$. There is convincing evidence that the true mean diameter is greater than 50 mm.
In the context of Example 11.1, describe what a Type I error would be.
Show Answer
AP Exam Tip: NEVER say "accept $H_0$." Always say "fail to reject $H_0$." A large $p$-value does not prove $H_0$ is true — it only means we lack sufficient evidence against it. Also, always write hypotheses in terms of the parameter ($\mu$, $p$), not the statistic ($\bar{x}$, $\hat{p}$).
One-sided $p$-value visualization: the shaded red area (right tail beyond $t = 2.13$) represents the $p$-value $\approx 0.020$. Under $H_0$, results this extreme occur only 2% of the time.
Figure 11.1 — Right-Tail $p$-Value (One-Sided Test)
11.2 One-Proportion z-Test
The one-proportion $z$-test tests a claim about a population proportion $p$. The key difference from the confidence interval is that conditions use $p_0$ (the null value), not $\hat{p}$.
One-Proportion z-Test
$H_0: p = p_0$; conditions use $p_0$: $np_0 \geq 10$ and $n(1-p_0) \geq 10$.
Test statistic:
$$z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}}$$
$p$-value from the standard Normal: one-sided ($P(Z > z)$ or $P(Z < z)$) or two-sided ($2P(Z > |z|)$).
Example 11.2 — Right-Sided One-Proportion z-Test
A company claims 30% of customers prefer their product. A competitor suspects the true proportion is higher. A random sample of $n = 100$ customers finds $\hat{p} = 0.38$. Test at $\alpha = 0.05$.
State: $H_0: p = 0.30$; $H_a: p > 0.30$
Plan: One-proportion $z$-test. Conditions using $p_0 = 0.30$: $100(0.3)=30\geq10$ ✓; $100(0.7)=70\geq10$ ✓; Random ✓; 10% ✓.
Do:
$$z = \frac{0.38 - 0.30}{\sqrt{(0.30)(0.70)/100}} = \frac{0.08}{0.0458} = 1.75$$
$p\text{-value} = P(Z > 1.75) = 1 - 0.9599 = 0.0401$
Conclude: Since $0.0401 < 0.05$, we reject $H_0$. There is convincing evidence that the true proportion preferring this product is greater than 0.30.
Example 11.3 — Two-Sided One-Proportion z-Test
Test whether a coin is fair. In 75 flips, 33 heads. $H_0: p = 0.50$; $H_a: p \neq 0.50$. $\alpha = 0.05$.
$\hat{p} = 33/75 = 0.44$.
$$z = \frac{0.44 - 0.50}{\sqrt{(0.50)(0.50)/75}} = \frac{-0.06}{0.0577} = -1.04$$
Two-sided $p\text{-value} = 2 \cdot P(Z < -1.04) = 2(0.149) = 0.298$.
Conclude: Since $0.298 > 0.05$, we fail to reject $H_0$. There is not convincing evidence that the coin is unfair.
$H_0: p = 0.25$; $H_a: p > 0.25$; $\hat{p} = 0.32$, $n = 200$. Calculate $z$ and determine if the result is significant at $\alpha = 0.05$.
Show Answer
$z = (0.32-0.25)/\sqrt{(0.25)(0.75)/200} = 0.07/0.03062 = 2.29$.
$p\text{-value} = P(Z > 2.29) \approx 0.011 < 0.05$. Reject $H_0$. There is convincing evidence that $p > 0.25$.
Two-sided $p$-value: both tails beyond $z = \pm 1.04$ are shaded. The total area (both red tails combined) equals $p\text{-value} = 0.298$, which is not small enough to reject $H_0$.
Figure 11.2 — Two-Sided $p$-Value Visualization
11.3 One-Sample t-Test
When testing a claim about a population mean $\mu$ with unknown $\sigma$, use the one-sample $t$-test. It follows the same four-step structure as the $z$-test, but uses the $t$-distribution with $df = n - 1$.
One-Sample t-Test
$H_0: \mu = \mu_0$; test statistic:
$$t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}, \qquad df = n - 1$$
The $p$-value is found from the $t$-distribution with $df = n - 1$.
Connection to confidence intervals: Reject $H_0: \mu = \mu_0$ at level $\alpha$ (two-sided) if and only if $\mu_0$ falls outside the $(1-\alpha) \times 100\%$ confidence interval for $\mu$.
Example 11.4 — One-Sample $t$-Test for a Mean
A nutritionist claims a new diet reduces LDL cholesterol by more than 10 points on average. A random sample of $n = 25$ patients shows $\bar{x} = 12.4$ points reduction, $s = 5.6$. Test at $\alpha = 0.05$.
State: $H_0: \mu = 10$; $H_a: \mu > 10$ (one-sided right)
Plan: One-sample $t$-test. Conditions: Random ✓; $n=25$ — assume Normal population ✓; 10% ✓.
Do: $df = 24$.
$$t = \frac{12.4 - 10}{5.6/\sqrt{25}} = \frac{2.4}{1.12} = 2.14$$
From $t$-table ($df=24$, one-sided): $p\text{-value} \approx 0.021$.
Conclude: Since $0.021 < 0.05$, we reject $H_0$. There is convincing evidence that the mean LDL reduction is greater than 10 points.
Example 11.5 — Connection Between CI and Significance Test
If a 95% CI for $\mu$ is $(11.2, 13.6)$ and you perform a two-sided test of $H_0: \mu = 10$ at $\alpha = 0.05$, what is the conclusion?
Since $\mu_0 = 10$ falls outside the interval $(11.2, 13.6)$, we reject $H_0$ at $\alpha = 0.05$. The two procedures always give consistent conclusions for two-sided tests.
$n = 16$, $\bar{x} = 48$, $s = 4$. Test $H_0: \mu = 50$ vs. $H_a: \mu \neq 50$. Find $t$, estimate the $p$-value, and state your conclusion at $\alpha = 0.05$.
Show Answer
Two-sided $p\text{-value} = 2P(t < -2.0) \approx 2(0.032) = 0.064$.
Since $0.064 > 0.05$, we fail to reject $H_0$. There is not convincing evidence that $\mu \neq 50$.
One-sided $t$-test: shaded right tail beyond $t = 2.14$ on a $t$-distribution with $df = 24$. The $p$-value $\approx 0.021$ is less than $\alpha = 0.05$, providing evidence to reject $H_0$.
Figure 11.3 — One-Sided $t$-Test $p$-Value ($df = 24$)
11.4 Two-Sample Tests and Chi-Square Preview
When comparing two groups, we use two-sample procedures. The key is whether data are proportions or means, and whether samples are independent.
Two-Sample z-Test for Proportions
Test $H_0: p_1 = p_2$. Use the pooled sample proportion $\hat{p}_c = \dfrac{x_1 + x_2}{n_1 + n_2}$ in the standard error (since $H_0$ assumes they are equal).
$$z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}_c(1-\hat{p}_c)\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}$$
Two-Sample t-Test for Means
Test $H_0: \mu_1 = \mu_2$. Do not pool variances (AP Statistics uses the unpooled formula). The degrees of freedom are computed by the calculator (conservative: use smaller of $n_1-1$, $n_2-1$).
Brief Introduction to Chi-Square
The chi-square test is used when both variables are categorical. Instead of a $z$- or $t$-statistic, we compute $\chi^2 = \sum \dfrac{(O-E)^2}{E}$. Chi-square tests always use right-tail $p$-values and cover goodness-of-fit, independence, and homogeneity (Chapter 12).
Example 11.6 — Two-Sample z-Test for Proportions
Two groups: Group 1 has $n_1 = 120$ with 66 successes; Group 2 has $n_2 = 100$ with 45 successes. Test $H_0: p_1 = p_2$ vs. $H_a: p_1 \neq p_2$ at $\alpha = 0.05$.
$\hat{p}_1 = 66/120 = 0.55$; $\hat{p}_2 = 45/100 = 0.45$.
Pooled: $\hat{p}_c = (66+45)/(120+100) = 111/220 = 0.505$.
$$z = \frac{0.55 - 0.45}{\sqrt{0.505 \cdot 0.495 \cdot (1/120 + 1/100)}} = \frac{0.10}{\sqrt{0.2500 \cdot 0.01833}} = \frac{0.10}{0.0677} = 1.48$$
Two-sided $p\text{-value} = 2P(Z > 1.48) = 2(0.069) = 0.138$.
Conclude: Since $0.138 > 0.05$, we fail to reject $H_0$. There is not convincing evidence of a difference between $p_1$ and $p_2$.
AP Exam Tip: In a two-sample $z$-test for proportions, always use the pooled proportion $\hat{p}_c$ in the denominator — not the individual $\hat{p}_1$ and $\hat{p}_2$. Conversely, for a confidence interval for $p_1 - p_2$, you use the individual $\hat{p}$s (no pooling). Also verify Large Counts using $\hat{p}_c$ for tests.
Practice Problems
$H_0: p = 0.40$, $H_a: p < 0.40$. Sample: $n=80$, $\hat{p}=0.33$. Carry out a one-proportion $z$-test at $\alpha = 0.05$.
Show Solution
$z=(0.33-0.40)/\sqrt{(0.4)(0.6)/80}=(−0.07)/0.05477=−1.28$.
Left-tail $p\text{-value}=P(Z<-1.28)=0.100$.
Since $0.100 > 0.05$, fail to reject $H_0$. Insufficient evidence that $p < 0.40$.
Describe how increasing the significance level from $\alpha = 0.01$ to $\alpha = 0.05$ affects (a) the probability of a Type I error, (b) the power of the test.
Show Solution
(b) Power increases — a larger $\alpha$ makes it easier to reject $H_0$, including when it is truly false. There is a trade-off between Type I error rate and power.
A random sample of $n=25$ has $\bar{x}=78.3$, $s=9.4$. Test $H_0: \mu=75$, $H_a: \mu>75$ at $\alpha=0.05$. ($df=24$, $t^*=1.711$)
Show Solution
Since $1.755 > 1.711$ (critical value at $\alpha=0.05$, one-sided), $p\text{-value} < 0.05$. Reject $H_0$. There is convincing evidence that $\mu > 75$.
A two-sided test yields $p\text{-value} = 0.032$. A 95% CI for $\mu$ is $(12.1, 19.7)$. Is $\mu_0 = 20$ inside or outside the CI, and does this match the test's conclusion?
Show Solution
A medical test has significance level $\alpha=0.01$. The doctors decide the Type II error (missing a disease) is more costly than a Type I error. Should they increase or decrease $\alpha$?
Show Solution
What does it mean for a result to be "statistically significant" but not "practically significant"? Give an example.
Show Solution
Two proportions: Group A has $n_1=150$, $\hat{p}_1=0.60$; Group B has $n_2=180$, $\hat{p}_2=0.52$. Test $H_0: p_1=p_2$ vs. $H_a: p_1>p_2$ at $\alpha=0.05$. Find the pooled proportion and $z$.
Show Solution
$z=(0.60-0.52)/\sqrt{0.5576\cdot0.4424\cdot(1/150+1/180)}=0.08/\sqrt{0.2467\cdot0.01222}=0.08/0.05488=1.458$.
$p\text{-value}=P(Z>1.46)\approx0.072>0.05$. Fail to reject $H_0$.
AP FRQ — Full Four-Step: A school claims 70% of students complete homework daily. A researcher surveys 120 randomly selected students; 78 say yes. Test the school's claim at $\alpha = 0.05$.
Show Solution
Plan: One-proportion $z$-test. Random ✓; $120(0.7)=84\geq10$, $120(0.3)=36\geq10$ ✓; 10% ✓.
Do: $\hat{p}=78/120=0.65$. $z=(0.65-0.70)/\sqrt{(0.70)(0.30)/120}=(-0.05)/0.04183=-1.196$. $p\text{-value}=2P(Z<-1.20)=2(0.115)=0.230$.
Conclude: Since $0.230>0.05$, fail to reject $H_0$. There is not convincing evidence that the true proportion differs from 70%.
📋 Chapter Summary
Hypothesis Testing Framework
The default claim (no effect, no difference). Always stated in terms of a parameter. We assume $H_0$ is true and look for evidence against it.
What we're trying to show evidence for. Can be one-sided ($<$ or $>$) or two-sided ($\neq$). Chosen before seeing data.
The probability of getting results as extreme or more extreme than observed, assuming $H_0$ is true. Small p-value → strong evidence against $H_0$.
The threshold for "surprising" data (typically $\alpha = 0.05$). Reject $H_0$ if $p\text{-value} \leq \alpha$.
Error Types
Rejecting $H_0$ when it is actually true. Probability = $\alpha$. A "false positive."
Failing to reject $H_0$ when it is actually false. Probability = $\beta$. A "false negative." Power $= 1 - \beta$.
$t = \dfrac{\bar{x} - \mu_0}{s/\sqrt{n}}$ with $df = n-1$. Used when testing a claim about a population mean with unknown $\sigma$.
$z = \dfrac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}}$. Used when testing a claim about a population proportion.