| Method | Description | Advantage |
|---|---|---|
| Simple random | Every member has equal chance; use random number table/GDC | Unbiased, no prior knowledge needed |
| Systematic | Select every $k$th item from list (e.g., every 10th) | Simple to apply, spread across population |
| Stratified | Divide into groups (strata), sample proportionally from each | Ensures representation of all subgroups |
| Convenience | Use whoever is available | Easy, but biased |
$r$ measures the strength and direction of a linear relationship: $-1 \le r \le 1$.
Regression line $y$ on $x$: $\hat{y} = ax + b$ (minimises sum of squared vertical residuals). Always passes through $(\bar{x}, \bar{y})$.
Spearman's rank correlation $r_s$: use when data is ordinal or non-linear.
Study hours ($x$) and test scores ($y$) give regression line $\hat{y} = 7.2x + 34$, with $r = 0.91$.
The normal distribution is bell-shaped and symmetric about the mean $\mu$. The standard deviation $\sigma$ controls the spread.
Standardisation: $Z = \dfrac{X-\mu}{\sigma}$, where $Z\sim N(0,1)$.
Key percentages (empirical rule):
Use GDC normalcdf($a$, $b$, $\mu$, $\sigma$) for $P(a \le X \le b)$. Use invNorm($p$, $\mu$, $\sigma$) for the value with $P(X \le x) = p$.
Test scores are $N(68, 12^2)$. Find: (a) $P(X > 80)$; (b) the score exceeded by only 10% of students.
The $\chi^2$ test determines whether two categorical variables are independent in a two-way contingency table.
Hypotheses: $H_0$: the two variables are independent; $H_1$: they are not independent.
Expected frequency: $E_{ij} = \dfrac{(\text{row }i\text{ total}) \times (\text{column }j\text{ total})}{\text{grand total}}$
Test statistic: $\chi^2_{\text{calc}} = \displaystyle\sum \dfrac{(O-E)^2}{E}$
Degrees of freedom: $\nu = (\text{rows}-1)(\text{columns}-1)$
Decision rule: Reject $H_0$ if $\chi^2_{\text{calc}} > \chi^2_{\text{crit}}$ (or if $p$-value $<$ significance level, typically 5%).
A 2×3 contingency table records diet type (vegetarian, vegan, omnivore) vs. health outcome (good, poor) for 200 people. Observed frequencies:
| Vegetarian | Vegan | Omnivore | Total | |
|---|---|---|---|---|
| Good health | 30 | 25 | 55 | 110 |
| Poor health | 20 | 15 | 55 | 90 |
| Total | 50 | 40 | 110 | 200 |
$163 = 175 - 1.5\times8$ and $191 = 175+2\times8$. Using GDC: $P(163 \le X \le 191) = P(-1.5 \le Z \le 2) \approx 0.9104 - 0.0668 = 0.9104$ — actually normalcdf(163,191,175,8) $\approx 91.0\%$.
$r_s = 0.85$ indicates a strong positive monotonic relationship between the two sets of ranked scores. Spearman's rank correlation is preferred when data is ordinal (ranked), when the relationship may be monotonic but not linear, or when there are outliers that would distort the Pearson coefficient.
$\chi^2_{\text{crit}}(2)$ at 5% significance level $= 5.991$. Since $8.34 > 5.991$, reject $H_0$. There is sufficient evidence at the 5% significance level to conclude that gender and preferred subject are not independent.