Statistics & Probability — IB Math AA

1. Descriptive Statistics

Descriptive statistics summarise the key features of a data set.

Measures of Centre & Spread

Mean: $\bar{x} = \dfrac{\sum x_i}{n}$ (or $\dfrac{\sum f_i x_i}{\sum f_i}$ for grouped data)
Median: Middle value when data is ordered
Mode: Most frequent value
Range: max $-$ min
IQR: $Q_3 - Q_1$ (interquartile range)
Population standard deviation: $\sigma = \sqrt{\dfrac{\sum(x_i - \bar{x})^2}{n}}$
Sample standard deviation: $s = \sqrt{\dfrac{\sum(x_i - \bar{x})^2}{n-1}}$

Which SD to use? Use the population standard deviation ($\sigma$, divides by $n$) when you have all the data. Use the sample standard deviation ($s$, divides by $n-1$) when your data is a sample from a larger population. IB calculators typically give both — read the question carefully.

2. Probability Rules

Core Probability Formulas

$0 \leq P(A) \leq 1$ for any event $A$
$P(A') = 1 - P(A)$ (complement)
$P(A \cup B) = P(A) + P(B) - P(A \cap B)$
Mutually exclusive: $P(A \cap B) = 0 \implies P(A \cup B) = P(A) + P(B)$
Conditional probability: $P(A \mid B) = \dfrac{P(A \cap B)}{P(B)}$
Independent events: $P(A \cap B) = P(A) \cdot P(B)$

Bayes' Theorem (HL)

HL: $P(A \mid B) = \dfrac{P(B \mid A) \cdot P(A)}{P(B)}$. Often solved using a tree diagram — multiply along branches and add across outcomes.

3. Discrete Distributions — Binomial

If $X$ counts the number of successes in $n$ independent trials, each with probability $p$ of success, then $X \sim B(n, p)$.

Binomial Distribution

$$P(X = r) = \binom{n}{r} p^r (1-p)^{n-r}$$

Expected value: $E(X) = np$

Variance: $\text{Var}(X) = np(1-p)$

Standard deviation: $\sigma = \sqrt{np(1-p)}$

4. Normal Distribution

The normal distribution $X \sim N(\mu, \sigma^2)$ is a symmetric, bell-shaped continuous distribution.

Normal Distribution Key Facts

Mean $= \mu$, Variance $= \sigma^2$, SD $= \sigma$
Symmetrical about $\mu$
Standardisation: $Z = \dfrac{X - \mu}{\sigma}$, where $Z \sim N(0, 1)$
68% of data within $\mu \pm \sigma$; 95% within $\mu \pm 2\sigma$; 99.7% within $\mu \pm 3\sigma$

In the IB exam, all normal distribution probabilities are found using the GDC (graphical display calculator), not tables.

5. Regression & Correlation

The Pearson's product-moment correlation coefficient (PMCC) $r$ measures the strength and direction of a linear relationship:

$r = 1$: perfect positive linear correlation
$r = -1$: perfect negative linear correlation
$r = 0$: no linear correlation

Regression Line of $y$ on $x$

$$\hat{y} = a + bx, \quad \text{where } b = \frac{S_{xy}}{S_{xx}}, \quad a = \bar{y} - b\bar{x}$$

The regression line always passes through $(\bar{x}, \bar{y})$.

Use the regression line of $y$ on $x$ to predict $y$ from a given $x$ value.

Caution: Correlation does not imply causation. Do not extrapolate beyond the data range — predictions become unreliable.

6. Hypothesis Testing (HL)

HL: Hypothesis testing involves making a statistical decision about a population parameter based on sample data.

The general procedure:

State $H_0$ (null hypothesis) and $H_1$ (alternative hypothesis).
Choose a significance level $\alpha$ (e.g., 5% = 0.05).
Calculate the test statistic and corresponding $p$-value using GDC.
Decision: if $p \leq \alpha$, reject $H_0$; if $p > \alpha$, do not reject $H_0$.
Conclude in context.

Common tests in IB AA HL:

$t$-test: Test a claim about a population mean when $\sigma$ is unknown.
Chi-squared ($\chi^2$) test: Test for independence between two categorical variables using a contingency table.

Worked Examples

Worked Example 1

Binomial Distribution — Probability Calculation

A fair six-sided die is rolled 8 times. Let $X$ be the number of times a 6 is rolled. Find $P(X \geq 2)$.

Identify: $X \sim B\!\left(8, \frac{1}{6}\right)$. We need $P(X \geq 2) = 1 - P(X \leq 1) = 1 - P(X=0) - P(X=1)$.

$P(X=0) = \binom{8}{0}\left(\frac{1}{6}\right)^0\!\left(\frac{5}{6}\right)^8 = \left(\frac{5}{6}\right)^8 \approx 0.2326$

$P(X=1) = \binom{8}{1}\left(\frac{1}{6}\right)^1\!\left(\frac{5}{6}\right)^7 = 8 \times \frac{1}{6} \times \left(\frac{5}{6}\right)^7 \approx 0.3721$

$P(X \geq 2) = 1 - 0.2326 - 0.3721 = \boxed{0.395}$ (to 3 s.f.)

Worked Example 2

Normal Distribution — Finding a Probability and an Unknown Mean

The mass of apples is normally distributed with mean $\mu = 180$ g and standard deviation $\sigma = 15$ g. (a) Find $P(X > 200)$. (b) Find the value of $m$ such that $P(X > m) = 0.1$.

Part (a): Standardise: $Z = \dfrac{200 - 180}{15} = \dfrac{20}{15} \approx 1.333$.

$P(X > 200) = P(Z > 1.333) \approx \boxed{0.0912}$ (from GDC: normalcdf$(200, \infty, 180, 15) \approx 0.0912$)

Part (b): $P(X > m) = 0.1$ means $P(X \leq m) = 0.9$. Using GDC: invNorm$(0.9, 180, 15) \approx \boxed{199.2}$ g.

Worked Example 3

Conditional Probability — Tree Diagram

Box A contains 3 red and 2 blue balls. Box B contains 1 red and 4 blue balls. A box is chosen at random, then a ball is drawn. Given that the ball is red, find the probability it came from Box A.

$P(\text{Red} \mid A) = \frac{3}{5}$, $P(\text{Red} \mid B) = \frac{1}{5}$, $P(A) = P(B) = \frac{1}{2}$.

$P(\text{Red}) = P(A) \cdot P(R \mid A) + P(B) \cdot P(R \mid B) = \frac{1}{2} \cdot \frac{3}{5} + \frac{1}{2} \cdot \frac{1}{5} = \frac{3}{10} + \frac{1}{10} = \frac{4}{10} = \frac{2}{5}$

$P(A \mid \text{Red}) = \dfrac{P(A \cap \text{Red})}{P(\text{Red})} = \dfrac{3/10}{2/5} = \dfrac{3}{10} \times \dfrac{5}{2} = \boxed{\dfrac{3}{4}}$

Practice Problems

Q1. A dataset has values 3, 7, 8, 12, 15, 17, 21, 25. Find the median, lower quartile $Q_1$, upper quartile $Q_3$, and IQR.

Show Solution

Ordered data (8 values): 3, 7, 8, 12, 15, 17, 21, 25.

Median $= \frac{12+15}{2} = 13.5$.

Lower half: 3, 7, 8, 12 → $Q_1 = \frac{7+8}{2} = 7.5$.

Upper half: 15, 17, 21, 25 → $Q_3 = \frac{17+21}{2} = 19$.

IQR $= Q_3 - Q_1 = 19 - 7.5 = \mathbf{11.5}$.

Q2. A player makes a free throw with probability 0.7. Find the probability of making exactly 6 out of 10 free throws. Also find the expected number and standard deviation.

Show Solution

$X \sim B(10, 0.7)$.

$P(X=6) = \binom{10}{6}(0.7)^6(0.3)^4 = 210 \times 0.117649 \times 0.0081 \approx \mathbf{0.200}$

$E(X) = np = 10 \times 0.7 = \mathbf{7}$

$\sigma = \sqrt{np(1-p)} = \sqrt{10 \times 0.7 \times 0.3} = \sqrt{2.1} \approx \mathbf{1.45}$

Q3. Test scores are normally distributed with $\mu = 65$ and $\sigma = 8$. What percentage of students score between 50 and 80?

Show Solution

Standardise: $z_1 = \frac{50-65}{8} = -1.875$ and $z_2 = \frac{80-65}{8} = 1.875$.

$P(50 < X < 80) = P(-1.875 < Z < 1.875) \approx 0.9394 - 0.0606 \approx \mathbf{93.9\%}$

(Using GDC: normalcdf$(50, 80, 65, 8) \approx 0.939$.)

Q4. Two events $A$ and $B$ satisfy $P(A) = 0.4$, $P(B) = 0.5$, $P(A \cap B) = 0.2$. Are $A$ and $B$ independent? Find $P(A \mid B)$.

Show Solution

Check independence: $P(A) \times P(B) = 0.4 \times 0.5 = 0.2 = P(A \cap B)$. Since this holds, $A$ and $B$ are independent.

$P(A \mid B) = \dfrac{P(A \cap B)}{P(B)} = \dfrac{0.2}{0.5} = \mathbf{0.4}$ (equal to $P(A)$, confirming independence).