Chapter 11: Significance Tests

AP Statistics · Statistical Inference · 3 interactive graphs · 8 practice problems

Learning Objectives

State hypotheses $H_0$ and $H_a$ correctly in the context of a problem
Interpret a $p$-value in context and make a conclusion at a given $\alpha$
Distinguish Type I error ($\alpha$) from Type II error ($\beta$) and power ($1-\beta$)
Carry out a one-proportion $z$-test with conditions verified
Carry out a one-sample $t$-test and connect it to confidence intervals
Apply two-sample $z$- and $t$-tests and recognize when to use a pooled proportion

11.1 The Logic of Significance Testing

A significance test uses sample data to evaluate a claim about a population parameter. The four-step framework used on the AP exam is: State → Plan → Do → Conclude.

The Four-Step Process

State: Identify the parameter in context. Write $H_0$ (null hypothesis) and $H_a$ (alternative hypothesis) using proper notation.
Plan: Choose the appropriate test. Verify conditions (Random, Normal/Large Counts, 10%).
Do: Calculate the test statistic and find the $p$-value.
Conclude: Compare $p$-value to $\alpha$. State conclusion in context — reject or fail to reject $H_0$.

Hypotheses

The null hypothesis $H_0$ represents the claim of no effect, no difference, or a specific parameter value (e.g., $H_0: p = 0.30$). The alternative hypothesis $H_a$ represents what we are trying to find evidence for. Alternatives are:

One-sided (left): $H_a: p < p_0$
One-sided (right): $H_a: p > p_0$
Two-sided: $H_a: p \neq p_0$ (most conservative; use when direction is unknown)

p-Value and Significance Level

The $p$-value is the probability of obtaining a test statistic as extreme as (or more extreme than) the observed value, assuming $H_0$ is true. A small $p$-value means the observed data would be unlikely if $H_0$ were true, giving evidence against $H_0$.

The significance level $\alpha$ is the threshold. We reject $H_0$ when $p\text{-value} < \alpha$.

Type I and Type II Errors

	$H_0$ is actually TRUE	$H_0$ is actually FALSE
Reject $H_0$	Type I Error (probability = $\alpha$)	Correct decision (Power = $1-\beta$)
Fail to reject $H_0$	Correct decision	Type II Error (probability = $\beta$)

Type I error ($\alpha$): Reject a true $H_0$ — "false positive." Probability = $\alpha$.
Type II error ($\beta$): Fail to reject a false $H_0$ — "false negative." Probability = $\beta$.
Power = $1 - \beta$: Probability of correctly rejecting a false $H_0$. Increased by larger $n$, larger effect size, or larger $\alpha$.

Example 11.1 — One-Sample $t$-Test (Right-Sided)

A quality-control team believes a machine produces parts with mean diameter $\mu > 50$ mm. A random sample of $n = 36$ gives $\bar{x} = 53.2$ mm, $s = 9$ mm.

State: $H_0: \mu = 50$; $H_a: \mu > 50$ (one-sided right)

Plan: One-sample $t$-test. Conditions: Random ✓; $n=36\geq30$ ✓; 10% ✓.

Do:

$$t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}} = \frac{53.2 - 50}{9/\sqrt{36}} = \frac{3.2}{1.5} = 2.13 \quad (df = 35)$$

From the $t$-table: $p\text{-value} \approx 0.020$.

Conclude: Since $0.020 < 0.05 = \alpha$, we reject $H_0$. There is convincing evidence that the true mean diameter is greater than 50 mm.

TRY IT

In the context of Example 11.1, describe what a Type I error would be.

Show Answer

A Type I error would be concluding that the true mean diameter is greater than 50 mm (rejecting $H_0$) when in fact the true mean is exactly 50 mm. The quality-control team would unnecessarily halt or recalibrate the machine.

★

AP Exam Tip: NEVER say "accept $H_0$." Always say "fail to reject $H_0$." A large $p$-value does not prove $H_0$ is true — it only means we lack sufficient evidence against it. Also, always write hypotheses in terms of the parameter ($\mu$, $p$), not the statistic ($\bar{x}$, $\hat{p}$).

One-sided $p$-value visualization: the shaded red area (right tail beyond $t = 2.13$) represents the $p$-value $\approx 0.020$. Under $H_0$, results this extreme occur only 2% of the time.

Figure 11.1 — Right-Tail $p$-Value (One-Sided Test)

11.2 One-Proportion z-Test

The one-proportion $z$-test tests a claim about a population proportion $p$. The key difference from the confidence interval is that conditions use $p_0$ (the null value), not $\hat{p}$.

One-Proportion z-Test

$H_0: p = p_0$; conditions use $p_0$: $np_0 \geq 10$ and $n(1-p_0) \geq 10$.

Test statistic:

$$z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}}$$

$p$-value from the standard Normal: one-sided ($P(Z > z)$ or $P(Z < z)$) or two-sided ($2P(Z > |z|)$).

Example 11.2 — Right-Sided One-Proportion z-Test

A company claims 30% of customers prefer their product. A competitor suspects the true proportion is higher. A random sample of $n = 100$ customers finds $\hat{p} = 0.38$. Test at $\alpha = 0.05$.

State: $H_0: p = 0.30$; $H_a: p > 0.30$

Plan: One-proportion $z$-test. Conditions using $p_0 = 0.30$: $100(0.3)=30\geq10$ ✓; $100(0.7)=70\geq10$ ✓; Random ✓; 10% ✓.

Do:

$$z = \frac{0.38 - 0.30}{\sqrt{(0.30)(0.70)/100}} = \frac{0.08}{0.0458} = 1.75$$

$p\text{-value} = P(Z > 1.75) = 1 - 0.9599 = 0.0401$

Conclude: Since $0.0401 < 0.05$, we reject $H_0$. There is convincing evidence that the true proportion preferring this product is greater than 0.30.

Example 11.3 — Two-Sided One-Proportion z-Test

Test whether a coin is fair. In 75 flips, 33 heads. $H_0: p = 0.50$; $H_a: p \neq 0.50$. $\alpha = 0.05$.

$\hat{p} = 33/75 = 0.44$.

$$z = \frac{0.44 - 0.50}{\sqrt{(0.50)(0.50)/75}} = \frac{-0.06}{0.0577} = -1.04$$

Two-sided $p\text{-value} = 2 \cdot P(Z < -1.04) = 2(0.149) = 0.298$.

Conclude: Since $0.298 > 0.05$, we fail to reject $H_0$. There is not convincing evidence that the coin is unfair.

TRY IT

$H_0: p = 0.25$; $H_a: p > 0.25$; $\hat{p} = 0.32$, $n = 200$. Calculate $z$ and determine if the result is significant at $\alpha = 0.05$.

Show Answer

Conditions: $200(0.25)=50\geq10$ ✓; $200(0.75)=150\geq10$ ✓.
$z = (0.32-0.25)/\sqrt{(0.25)(0.75)/200} = 0.07/0.03062 = 2.29$.
$p\text{-value} = P(Z > 2.29) \approx 0.011 < 0.05$. Reject $H_0$. There is convincing evidence that $p > 0.25$.

Two-sided $p$-value: both tails beyond $z = \pm 1.04$ are shaded. The total area (both red tails combined) equals $p\text{-value} = 0.298$, which is not small enough to reject $H_0$.

Figure 11.2 — Two-Sided $p$-Value Visualization

11.3 One-Sample t-Test

When testing a claim about a population mean $\mu$ with unknown $\sigma$, use the one-sample $t$-test. It follows the same four-step structure as the $z$-test, but uses the $t$-distribution with $df = n - 1$.

One-Sample t-Test

$H_0: \mu = \mu_0$; test statistic:

$$t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}, \qquad df = n - 1$$

The $p$-value is found from the $t$-distribution with $df = n - 1$.

Connection to confidence intervals: Reject $H_0: \mu = \mu_0$ at level $\alpha$ (two-sided) if and only if $\mu_0$ falls outside the $(1-\alpha) \times 100\%$ confidence interval for $\mu$.

Example 11.4 — One-Sample $t$-Test for a Mean

A nutritionist claims a new diet reduces LDL cholesterol by more than 10 points on average. A random sample of $n = 25$ patients shows $\bar{x} = 12.4$ points reduction, $s = 5.6$. Test at $\alpha = 0.05$.

State: $H_0: \mu = 10$; $H_a: \mu > 10$ (one-sided right)

Plan: One-sample $t$-test. Conditions: Random ✓; $n=25$ — assume Normal population ✓; 10% ✓.

Do: $df = 24$.

$$t = \frac{12.4 - 10}{5.6/\sqrt{25}} = \frac{2.4}{1.12} = 2.14$$

From $t$-table ($df=24$, one-sided): $p\text{-value} \approx 0.021$.

Conclude: Since $0.021 < 0.05$, we reject $H_0$. There is convincing evidence that the mean LDL reduction is greater than 10 points.

Example 11.5 — Connection Between CI and Significance Test

If a 95% CI for $\mu$ is $(11.2, 13.6)$ and you perform a two-sided test of $H_0: \mu = 10$ at $\alpha = 0.05$, what is the conclusion?

Since $\mu_0 = 10$ falls outside the interval $(11.2, 13.6)$, we reject $H_0$ at $\alpha = 0.05$. The two procedures always give consistent conclusions for two-sided tests.

TRY IT

$n = 16$, $\bar{x} = 48$, $s = 4$. Test $H_0: \mu = 50$ vs. $H_a: \mu \neq 50$. Find $t$, estimate the $p$-value, and state your conclusion at $\alpha = 0.05$.

Show Answer

$t = (48-50)/(4/\sqrt{16}) = -2.0/1.0 = -2.0$. $df = 15$.
Two-sided $p\text{-value} = 2P(t < -2.0) \approx 2(0.032) = 0.064$.
Since $0.064 > 0.05$, we fail to reject $H_0$. There is not convincing evidence that $\mu \neq 50$.

One-sided $t$-test: shaded right tail beyond $t = 2.14$ on a $t$-distribution with $df = 24$. The $p$-value $\approx 0.021$ is less than $\alpha = 0.05$, providing evidence to reject $H_0$.

Figure 11.3 — One-Sided $t$-Test $p$-Value ($df = 24$)

11.4 Two-Sample Tests and Chi-Square Preview

When comparing two groups, we use two-sample procedures. The key is whether data are proportions or means, and whether samples are independent.

Two-Sample z-Test for Proportions

Test $H_0: p_1 = p_2$. Use the pooled sample proportion $\hat{p}_c = \dfrac{x_1 + x_2}{n_1 + n_2}$ in the standard error (since $H_0$ assumes they are equal).

$$z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}_c(1-\hat{p}_c)\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}$$

Two-Sample t-Test for Means

Test $H_0: \mu_1 = \mu_2$. Do not pool variances (AP Statistics uses the unpooled formula). The degrees of freedom are computed by the calculator (conservative: use smaller of $n_1-1$, $n_2-1$).

Brief Introduction to Chi-Square

The chi-square test is used when both variables are categorical. Instead of a $z$- or $t$-statistic, we compute $\chi^2 = \sum \dfrac{(O-E)^2}{E}$. Chi-square tests always use right-tail $p$-values and cover goodness-of-fit, independence, and homogeneity (Chapter 12).

Example 11.6 — Two-Sample z-Test for Proportions

Two groups: Group 1 has $n_1 = 120$ with 66 successes; Group 2 has $n_2 = 100$ with 45 successes. Test $H_0: p_1 = p_2$ vs. $H_a: p_1 \neq p_2$ at $\alpha = 0.05$.

$\hat{p}_1 = 66/120 = 0.55$; $\hat{p}_2 = 45/100 = 0.45$.

Pooled: $\hat{p}_c = (66+45)/(120+100) = 111/220 = 0.505$.

$$z = \frac{0.55 - 0.45}{\sqrt{0.505 \cdot 0.495 \cdot (1/120 + 1/100)}} = \frac{0.10}{\sqrt{0.2500 \cdot 0.01833}} = \frac{0.10}{0.0677} = 1.48$$

Two-sided $p\text{-value} = 2P(Z > 1.48) = 2(0.069) = 0.138$.

Conclude: Since $0.138 > 0.05$, we fail to reject $H_0$. There is not convincing evidence of a difference between $p_1$ and $p_2$.

★

AP Exam Tip: In a two-sample $z$-test for proportions, always use the pooled proportion $\hat{p}_c$ in the denominator — not the individual $\hat{p}_1$ and $\hat{p}_2$. Conversely, for a confidence interval for $p_1 - p_2$, you use the individual $\hat{p}$s (no pooling). Also verify Large Counts using $\hat{p}_c$ for tests.

Practice Problems

$H_0: p = 0.40$, $H_a: p < 0.40$. Sample: $n=80$, $\hat{p}=0.33$. Carry out a one-proportion $z$-test at $\alpha = 0.05$.

Show Solution

Conditions: $80(0.4)=32\geq10$ ✓; $80(0.6)=48\geq10$ ✓.
$z=(0.33-0.40)/\sqrt{(0.4)(0.6)/80}=(−0.07)/0.05477=−1.28$.
Left-tail $p\text{-value}=P(Z<-1.28)=0.100$.
Since $0.100 > 0.05$, fail to reject $H_0$. Insufficient evidence that $p < 0.40$.

Describe how increasing the significance level from $\alpha = 0.01$ to $\alpha = 0.05$ affects (a) the probability of a Type I error, (b) the power of the test.

Show Solution

(a) Type I error probability increases from 0.01 to 0.05 — more likely to falsely reject a true $H_0$.
(b) Power increases — a larger $\alpha$ makes it easier to reject $H_0$, including when it is truly false. There is a trade-off between Type I error rate and power.

A random sample of $n=25$ has $\bar{x}=78.3$, $s=9.4$. Test $H_0: \mu=75$, $H_a: \mu>75$ at $\alpha=0.05$. ($df=24$, $t^*=1.711$)

Show Solution

$t=(78.3-75)/(9.4/\sqrt{25})=3.3/1.88=1.755$.
Since $1.755 > 1.711$ (critical value at $\alpha=0.05$, one-sided), $p\text{-value} < 0.05$. Reject $H_0$. There is convincing evidence that $\mu > 75$.

A two-sided test yields $p\text{-value} = 0.032$. A 95% CI for $\mu$ is $(12.1, 19.7)$. Is $\mu_0 = 20$ inside or outside the CI, and does this match the test's conclusion?

Show Solution

$\mu_0 = 20$ is outside $(12.1, 19.7)$. This is consistent with the test: $p\text{-value} = 0.032 < 0.05$, so we reject $H_0: \mu = 20$ at $\alpha = 0.05$. The CI and hypothesis test always agree for two-sided tests at the same level.

A medical test has significance level $\alpha=0.01$. The doctors decide the Type II error (missing a disease) is more costly than a Type I error. Should they increase or decrease $\alpha$?

Show Solution

They should increase $\alpha$ (e.g., to 0.05 or 0.10). A larger $\alpha$ reduces the Type II error rate by making it easier to reject $H_0$, at the cost of more false positives. When missing a disease is dangerous, reducing false negatives is the priority.

What does it mean for a result to be "statistically significant" but not "practically significant"? Give an example.

Show Solution

Statistical significance means $p\text{-value} < \alpha$; practical significance means the effect size matters in the real world. Example: with $n = 1{,}000{,}000$, a test showing a drug reduces blood pressure by 0.1 mm Hg might give $p = 0.0001$ (statistically significant) but a 0.1 mm Hg reduction has no clinical meaning (not practically significant).

Two proportions: Group A has $n_1=150$, $\hat{p}_1=0.60$; Group B has $n_2=180$, $\hat{p}_2=0.52$. Test $H_0: p_1=p_2$ vs. $H_a: p_1>p_2$ at $\alpha=0.05$. Find the pooled proportion and $z$.

Show Solution

$\hat{p}_c=(90+93.6)/(150+180)=183.6/330\approx0.557$. (Actually: $90+94=184$; $\hat{p}_c=184/330=0.5576$.)
$z=(0.60-0.52)/\sqrt{0.5576\cdot0.4424\cdot(1/150+1/180)}=0.08/\sqrt{0.2467\cdot0.01222}=0.08/0.05488=1.458$.
$p\text{-value}=P(Z>1.46)\approx0.072>0.05$. Fail to reject $H_0$.

AP FRQ — Full Four-Step: A school claims 70% of students complete homework daily. A researcher surveys 120 randomly selected students; 78 say yes. Test the school's claim at $\alpha = 0.05$.

Show Solution

State: $H_0: p=0.70$; $H_a: p\neq0.70$. $p$ = true proportion who complete homework daily.
Plan: One-proportion $z$-test. Random ✓; $120(0.7)=84\geq10$, $120(0.3)=36\geq10$ ✓; 10% ✓.
Do: $\hat{p}=78/120=0.65$. $z=(0.65-0.70)/\sqrt{(0.70)(0.30)/120}=(-0.05)/0.04183=-1.196$. $p\text{-value}=2P(Z<-1.20)=2(0.115)=0.230$.
Conclude: Since $0.230>0.05$, fail to reject $H_0$. There is not convincing evidence that the true proportion differs from 70%.

📋 Chapter Summary

Hypothesis Testing Framework

Null Hypothesis ($H_0$)

The default claim (no effect, no difference). Always stated in terms of a parameter. We assume $H_0$ is true and look for evidence against it.

Alternative Hypothesis ($H_a$)

What we're trying to show evidence for. Can be one-sided ($<$ or $>$) or two-sided ($\neq$). Chosen before seeing data.

p-value

The probability of getting results as extreme or more extreme than observed, assuming $H_0$ is true. Small p-value → strong evidence against $H_0$.

Significance Level $\alpha$

The threshold for "surprising" data (typically $\alpha = 0.05$). Reject $H_0$ if $p\text{-value} \leq \alpha$.

Error Types

Type I Error

Rejecting $H_0$ when it is actually true. Probability = $\alpha$. A "false positive."

Type II Error

Failing to reject $H_0$ when it is actually false. Probability = $\beta$. A "false negative." Power $= 1 - \beta$.

One-Sample $t$-test

$t = \dfrac{\bar{x} - \mu_0}{s/\sqrt{n}}$ with $df = n-1$. Used when testing a claim about a population mean with unknown $\sigma$.

One-Proportion $z$-test

$z = \dfrac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}}$. Used when testing a claim about a population proportion.

📘 Key Terms

Null HypothesisThe claim of no effect or no difference. Assumed true until evidence suggests otherwise.

p-valueProbability of observing data at least as extreme as ours, assuming $H_0$ is true. Not the probability that $H_0$ is true.

Statistical SignificanceA result is statistically significant at level $\alpha$ if p-value $\leq \alpha$. Means the data is unlikely under $H_0$.

Type I ErrorRejecting $H_0$ when $H_0$ is actually true. Probability equals the significance level $\alpha$.

Type II ErrorFailing to reject $H_0$ when $H_a$ is true. Probability is $\beta$; power of the test is $1-\beta$.

Power$1 - \beta$ — probability of correctly rejecting $H_0$ when $H_a$ is true. Increased by larger $n$ or larger effect size.

← Chapter 10: Confidence Intervals Chapter 12: Chi-Square Tests →