Markov chains, steady-state distributions, χ² tests, and confidence intervals
Math AI HL Topic 4 ExtensionA Markov chain is a stochastic process where the probability of moving to the next state depends only on the current state (not the history). This property is called the Markov property.
State after $n$ steps:
$$s_n = s_0 \cdot P^n$$where $s_0$ is the initial state vector and $P$ is the transition matrix.
Transition matrix structure: Each entry $P_{ij} \geq 0$ and $\sum_j P_{ij} = 1$ for each row $i$.
A transition diagram (state diagram) is a directed graph where:
The weather in a city is either Sunny (S) or Rainy (R). If today is sunny, tomorrow is sunny with probability 0.7. If today is rainy, tomorrow is sunny with probability 0.4. Write the transition matrix and find the state after 2 days if today is sunny.
A regular Markov chain (some power of $P$ has all positive entries) converges to a unique steady-state distribution $\pi$, regardless of the initial state.
The steady-state vector $\pi = (\pi_1, \pi_2, \ldots)$ satisfies:
$$\boldsymbol{\pi} P = \boldsymbol{\pi} \quad \text{and} \quad \sum_i \pi_i = 1$$Method 1 (for 2 states): Solve $\pi_1 p_{12} = \pi_2 p_{21}$ (balance equations) with $\pi_1 + \pi_2 = 1$.
Method 2 (general): Solve $(P^T - I)\boldsymbol{\pi}^T = \mathbf{0}$ as a homogeneous system, then normalise.
Method 3 (technology): Compute $P^n$ for large $n$ — all rows converge to $\pi$.
Using the weather Markov chain from Example 1 ($P_{SS}=0.7$, $P_{SR}=0.3$, $P_{RS}=0.4$, $P_{RR}=0.6$), find the long-run proportion of sunny days.
The $\chi^2$ goodness of fit test determines whether observed data follows a claimed probability distribution.
Test statistic:
$$\chi^2_{\text{calc}} = \sum \frac{(O-E)^2}{E}$$where $O$ = observed frequency, $E$ = expected frequency.
Degrees of freedom: $\nu = k - 1 - m$, where $k$ = number of categories, $m$ = number of parameters estimated from data.
Decision rule: Reject $H_0$ if $\chi^2_{\text{calc}} > \chi^2_{\text{crit}}$ at the chosen significance level $\alpha$.
Conditions: Expected frequencies $E \geq 5$ for each cell (merge categories if needed).
A die is rolled 120 times. Results: 1=18, 2=22, 3=25, 4=14, 5=20, 6=21. Test at 5% significance whether the die is fair.
The $t$-test is used when testing population means with small samples or unknown population variance. It is more robust than the $z$-test for small $n$.
Tests $H_0: \mu_1 = \mu_2$ (two population means are equal).
Test statistic: $t = \dfrac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}$
Degrees of freedom: Welch's approximation (use GDC).
Paired $t$-test: For matched pairs (before/after), compute differences $d_i = x_{i1}-x_{i2}$, then test $H_0: \mu_d = 0$.
| Error Type | Definition | Probability | Controlled by |
|---|---|---|---|
| Type I | Reject $H_0$ when $H_0$ is true (false positive) | $\alpha$ (significance level) | Choice of $\alpha$ |
| Type II | Fail to reject $H_0$ when $H_1$ is true (false negative) | $\beta$ | Sample size, effect size |
A confidence interval (CI) is a range of plausible values for the true population parameter, constructed so that (for a 95% CI) 95% of such intervals computed from repeated sampling would contain the true parameter.
Known $\sigma$ (use $z$): $\bar{x} \pm z^* \cdot \dfrac{\sigma}{\sqrt{n}}$, where $z^* = 1.96$ for 95%.
Unknown $\sigma$ (use $t$): $\bar{x} \pm t^* \cdot \dfrac{s}{\sqrt{n}}$, where $t^*$ has $n-1$ degrees of freedom.
1. A Markov chain has 3 states with transition matrix $P = \begin{pmatrix}0.5&0.3&0.2\\0.1&0.6&0.3\\0.4&0.2&0.4\end{pmatrix}$. If the initial state vector is $s_0 = (1,0,0)$, find $s_1$ and $s_2$.
$s_1 = s_0 P = (1,0,0)P = (0.5, 0.3, 0.2)$ (just the first row of $P$)
$s_2 = s_1 P = (0.5,0.3,0.2)\begin{pmatrix}0.5&0.3&0.2\\0.1&0.6&0.3\\0.4&0.2&0.4\end{pmatrix}$
$= (0.25+0.03+0.08,\ 0.15+0.18+0.04,\ 0.10+0.09+0.08) = (0.36, 0.37, 0.27)$
2. For a 2-state Markov chain with $P = \begin{pmatrix}0.8&0.2\\0.3&0.7\end{pmatrix}$, find the steady-state distribution.
Balance equation: $\pi_1 \cdot 0.2 = \pi_2 \cdot 0.3$, and $\pi_1 + \pi_2 = 1$.
$0.2\pi_1 = 0.3(1-\pi_1) \Rightarrow 0.5\pi_1 = 0.3 \Rightarrow \pi_1 = 0.6$, $\pi_2 = 0.4$.
Steady-state: $\boldsymbol{\pi} = (0.6, 0.4)$
3. A sample of 16 observations has mean 42.3 and standard deviation 5.1. Construct a 95% confidence interval for the population mean (use $t^*_{15} = 2.131$).
$\bar{x} \pm t^* \cdot \dfrac{s}{\sqrt{n}} = 42.3 \pm 2.131 \times \dfrac{5.1}{\sqrt{16}} = 42.3 \pm 2.131 \times 1.275$
$= 42.3 \pm 2.717$
95% CI: $(39.58,\ 45.02)$
We are 95% confident the true population mean lies between 39.58 and 45.02.