A-Level Further Mathematics – Chapter 11: Further Statistics Further Statistics

Edexcel · AQA · OCR A-Level Further Mathematics · Updated March 2026

Contents

  1. 11.1 Poisson Distribution
  2. 11.2 Continuous Random Variables
  3. 11.3 Continuous Distributions: Uniform and Exponential
  4. 11.4 Chi-Squared ($\chi^2$) Tests
  5. 11.5 Correlation and Regression
  6. 11.6 Further Hypothesis Testing: the $t$-Distribution
  7. Practice Problems

11.1 Poisson Distribution

Definition — Poisson Distribution

$X \sim \text{Po}(\lambda)$ if $X$ counts the number of random events occurring in a fixed interval of time or space, where events occur independently and at a constant average rate $\lambda$. The probability mass function is:

$$P(X = r) = \frac{e^{-\lambda}\lambda^r}{r!}, \quad r = 0, 1, 2, \ldots$$

Key properties: $\mathrm{E}(X) = \lambda$, $\mathrm{Var}(X) = \lambda$ (mean equals variance).

The Poisson conditions are: events occur (i) randomly, (ii) independently, (iii) singly (not in clusters), and (iv) at a constant average rate. When the rate changes (e.g. over a different time period), scale $\lambda$ proportionally.

Poisson as a Limit of the Binomial

If $X \sim B(n, p)$ with $n$ large and $p$ small such that $np = \lambda$ remains constant, then as $n \to \infty$:

$$B(n, p) \approx \text{Po}(\lambda)$$

A common rule of thumb: use the Poisson approximation when $n > 50$ and $p < 0.1$ (or $np < 5$).

Figure 11.1 — Probability distribution of $X \sim \text{Po}(3)$, showing $P(X = r)$ for $r = 0, 1, \ldots, 10$. The distribution is right-skewed with mode at $r = 2$ and $r = 3$.

Example 11.1.1 — Basic Poisson Probabilities

Calls arrive at a switchboard at an average rate of 4 per minute. Find $P(X = 2)$ and $P(X \leq 2)$ where $X \sim \text{Po}(4)$.

$P(X = 2) = \dfrac{e^{-4} \cdot 4^2}{2!} = \dfrac{e^{-4} \cdot 16}{2} = 8e^{-4} \approx 0.1465$.

$P(X \leq 2) = P(0) + P(1) + P(2) = e^{-4}\left(1 + 4 + 8\right) = 13e^{-4} \approx 0.2381$.

Example 11.1.2 — Changing the Time Period

Defects in a sheet of metal occur at an average of 2 per m². Find the probability of exactly 5 defects in a 3 m² sheet.

New rate: $\lambda = 2 \times 3 = 6$. $P(X = 5) = \dfrac{e^{-6} \cdot 6^5}{5!} = \dfrac{e^{-6} \cdot 7776}{120} = 64.8e^{-6} \approx 0.1606$.

Example 11.1.3 — Poisson Approximation to Binomial

A batch of 500 components contains 0.6% defective items. Find the approximate probability that a random sample of 200 contains at most 2 defectives.

$\lambda = np = 200 \times 0.006 = 1.2$. Using $X \sim \text{Po}(1.2)$:
$P(X \leq 2) = e^{-1.2}(1 + 1.2 + 0.72) = 2.92e^{-1.2} \approx 0.8795$.

Example 11.1.4 — Sum of Independent Poisson Variables

If $X \sim \text{Po}(2)$ and $Y \sim \text{Po}(3)$ independently, find $P(X + Y = 4)$.

$X + Y \sim \text{Po}(5)$. $P(X + Y = 4) = \dfrac{e^{-5} \cdot 5^4}{4!} = \dfrac{625e^{-5}}{24} \approx 0.1755$.

Exam Tip

The sum of independent Poisson variables is again Poisson: if $X \sim \text{Po}(\lambda_1)$ and $Y \sim \text{Po}(\lambda_2)$ independently, then $X + Y \sim \text{Po}(\lambda_1 + \lambda_2)$. State this result explicitly in your working.

11.2 Continuous Random Variables

Definition — Probability Density Function (PDF)

A continuous random variable $X$ has a probability density function $f(x)$ if, for all $a \leq b$:

$$P(a \leq X \leq b) = \int_a^b f(x)\,dx$$

Required conditions: $f(x) \geq 0$ for all $x$, and $\displaystyle\int_{-\infty}^{\infty} f(x)\,dx = 1$.

Definition — Cumulative Distribution Function (CDF)

$$F(x) = P(X \leq x) = \int_{-\infty}^{x} f(t)\,dt$$

Properties: $F(-\infty) = 0$, $F(\infty) = 1$, $F$ is non-decreasing, and $f(x) = F'(x)$.

Key Formulae for Expectation and Variance

$$\mathrm{E}(X) = \int_{-\infty}^{\infty} x\,f(x)\,dx$$

$$\mathrm{E}(X^2) = \int_{-\infty}^{\infty} x^2 f(x)\,dx$$

$$\mathrm{Var}(X) = \mathrm{E}(X^2) - [\mathrm{E}(X)]^2$$

The median $m$ satisfies $F(m) = \tfrac{1}{2}$; the mode is the value of $x$ at which $f(x)$ is maximised.

Example 11.2.1 — Finding the Constant $k$

A continuous random variable has PDF $f(x) = kx(2-x)$ for $0 \leq x \leq 2$, and $0$ otherwise. Find $k$.

$\displaystyle\int_0^2 kx(2-x)\,dx = k\int_0^2 (2x - x^2)\,dx = k\left[x^2 - \tfrac{x^3}{3}\right]_0^2 = k\left(4 - \tfrac{8}{3}\right) = k \cdot \tfrac{4}{3} = 1$.

Therefore $k = \tfrac{3}{4}$.

Example 11.2.2 — Finding Mean, Variance, and Median

For the distribution in Example 11.2.1 with $f(x) = \tfrac{3}{4}x(2-x)$, $0 \leq x \leq 2$:

$\mathrm{E}(X) = \tfrac{3}{4}\displaystyle\int_0^2 x^2(2-x)\,dx = \tfrac{3}{4}\left[\tfrac{2x^3}{3} - \tfrac{x^4}{4}\right]_0^2 = \tfrac{3}{4}\left(\tfrac{16}{3} - 4\right) = \tfrac{3}{4} \cdot \tfrac{4}{3} = 1$. (By symmetry, $\mathrm{E}(X) = 1$.)

$\mathrm{E}(X^2) = \tfrac{3}{4}\displaystyle\int_0^2 x^3(2-x)\,dx = \tfrac{3}{4}\left[\tfrac{x^4}{2} - \tfrac{x^5}{5}\right]_0^2 = \tfrac{3}{4}\left(8 - \tfrac{32}{5}\right) = \tfrac{3}{4} \cdot \tfrac{8}{5} = \tfrac{6}{5}$.

$\mathrm{Var}(X) = \tfrac{6}{5} - 1 = \tfrac{1}{5}$. Median: by symmetry, $m = 1$.

Example 11.2.3 — Finding the CDF and a Percentile

For $f(x) = \tfrac{3}{4}x(2-x)$, $0 \leq x \leq 2$, find $F(x)$ and the upper quartile.

$F(x) = \tfrac{3}{4}\displaystyle\int_0^x t(2-t)\,dt = \tfrac{3}{4}\left[t^2 - \tfrac{t^3}{3}\right]_0^x = \tfrac{3}{4}\left(x^2 - \tfrac{x^3}{3}\right) = \tfrac{3x^2}{4} - \tfrac{x^3}{4}$.

For the upper quartile $q$: $F(q) = \tfrac{3}{4}$. Solving $\tfrac{3q^2}{4} - \tfrac{q^3}{4} = \tfrac{3}{4}$, i.e. $q^3 - 3q^2 + 3 = 0$. Numerically, $q \approx 1.347$.

Exam Tip

For a CDF question, always verify $F(0) = 0$ and $F(\text{upper limit}) = 1$ after integrating. If they do not hold, you have made an error in the limits or the integral.

11.3 Continuous Distributions: Uniform and Exponential

Uniform Distribution $U(a, b)$

$$f(x) = \frac{1}{b-a}, \quad a \leq x \leq b$$

$\mathrm{E}(X) = \dfrac{a+b}{2}$, $\quad\mathrm{Var}(X) = \dfrac{(b-a)^2}{12}$.

Exponential Distribution $\text{Exp}(\lambda)$

$$f(x) = \lambda e^{-\lambda x}, \quad x \geq 0$$

$F(x) = 1 - e^{-\lambda x}$, $\quad\mathrm{E}(X) = \dfrac{1}{\lambda}$, $\quad\mathrm{Var}(X) = \dfrac{1}{\lambda^2}$.

The exponential distribution models waiting times between Poisson events. It has the memoryless property: $P(X > s + t \mid X > s) = P(X > t)$.

Figure 11.2 — Exponential PDFs $f(x) = \lambda e^{-\lambda x}$ for $\lambda = 0.5$ (blue), $\lambda = 1$ (red), and $\lambda = 2$ (green). Higher $\lambda$ gives steeper decay and smaller mean.

Example 11.3.1 — Exponential Probability

The lifetime (in years) of a light bulb follows $\text{Exp}(0.5)$. Find the probability the bulb lasts more than 3 years.

$P(X > 3) = 1 - F(3) = e^{-0.5 \times 3} = e^{-1.5} \approx 0.2231$.

Example 11.3.2 — Memoryless Property

Using Example 11.3.1, given the bulb has already lasted 2 years, find the probability it lasts a further 3 years.

By the memoryless property: $P(X > 5 \mid X > 2) = P(X > 3) = e^{-1.5} \approx 0.2231$.

The past lifetime is irrelevant — the bulb is "as good as new" at any given moment.

Example 11.3.3 — Uniform Distribution

A bus arrives uniformly between 8:00 and 8:20. Find the probability of waiting more than 12 minutes and the expected waiting time.

$X \sim U(0, 20)$ (minutes). $P(X > 12) = \dfrac{20-12}{20} = \dfrac{8}{20} = 0.4$.

$\mathrm{E}(X) = \dfrac{0 + 20}{2} = 10$ minutes.

Example 11.3.4 — Relationship Between Poisson and Exponential

Events occur at a Poisson rate of $\lambda = 3$ per hour. Find the probability the next event occurs within 15 minutes.

Waiting time $T \sim \text{Exp}(3)$ (per hour), so for 15 min = 0.25 hours:
$P(T \leq 0.25) = 1 - e^{-3 \times 0.25} = 1 - e^{-0.75} \approx 0.5276$.

Exam Tip

The connection between the Poisson distribution (events per unit time) and the exponential distribution (waiting time between events) appears frequently. If $X \sim \text{Po}(\lambda)$ counts events per unit time, then the waiting time $T \sim \text{Exp}(\lambda)$.

11.4 Chi-Squared ($\chi^2$) Tests

The $\chi^2$ test assesses whether observed frequencies differ significantly from expected frequencies, under a stated null hypothesis $H_0$.

Chi-Squared Test Statistic

$$\chi^2 = \sum \frac{(O - E)^2}{E}$$

where $O$ is the observed frequency and $E$ is the expected frequency for each cell. Under $H_0$, this statistic follows a $\chi^2_\nu$ distribution with $\nu$ degrees of freedom.

Requirement: All expected frequencies should be at least 5. If not, merge adjacent cells and reduce $\nu$ accordingly.

Example 11.4.1 — Goodness-of-Fit Test

A die is rolled 120 times. Test at the 5% level whether the die is fair.

Face123456
Observed172219231821
Expected202020202020

$\chi^2 = \dfrac{9}{20} + \dfrac{4}{20} + \dfrac{1}{20} + \dfrac{9}{20} + \dfrac{4}{20} + \dfrac{1}{20} = \dfrac{28}{20} = 1.4$.

$\nu = 6 - 1 = 5$. Critical value at 5%: $\chi^2_{5, 0.05} = 11.07$. Since $1.4 < 11.07$, do not reject $H_0$ — no evidence the die is unfair.

Example 11.4.2 — Contingency Table

Test at the 5% level whether gender is independent of choice of A-Level subject (Mathematics or Biology) for a sample of 200 students.

MathsBiologyTotal
Male6040100
Female3070100
Total90110200

Expected frequencies: $E_{11} = \frac{100 \times 90}{200} = 45$, $E_{12} = 55$, $E_{21} = 45$, $E_{22} = 55$.

$\chi^2 = \dfrac{(60-45)^2}{45} + \dfrac{(40-55)^2}{55} + \dfrac{(30-45)^2}{45} + \dfrac{(70-55)^2}{55} = 5 + \tfrac{225}{55} + 5 + \tfrac{225}{55} \approx 18.18$.

$\nu = (2-1)(2-1) = 1$. Critical value: $\chi^2_{1, 0.05} = 3.841$. Since $18.18 > 3.841$, reject $H_0$ — there is evidence that gender and subject choice are not independent.

Exam Tip

In a contingency table, the expected frequency formula is $E = \dfrac{\text{row total} \times \text{column total}}{\text{grand total}}$. Always check that all $E \geq 5$; if any are less than 5, combine rows or columns and state you are doing so.

11.5 Correlation and Regression

Pearson's Product-Moment Correlation Coefficient

$$r = \frac{S_{xy}}{\sqrt{S_{xx} S_{yy}}}$$

where $S_{xy} = \sum x_i y_i - \dfrac{(\sum x_i)(\sum y_i)}{n}$, $\;S_{xx} = \sum x_i^2 - \dfrac{(\sum x_i)^2}{n}$, $\;S_{yy} = \sum y_i^2 - \dfrac{(\sum y_i)^2}{n}$.

$r$ lies in $[-1, 1]$. $r = +1$ perfect positive linear correlation; $r = -1$ perfect negative; $r = 0$ no linear correlation.

Hypothesis Test for Zero Correlation

To test $H_0: \rho = 0$ against $H_1: \rho \neq 0$ (or one-tailed), compute $r$ and compare with critical values from the PMCC table at the given significance level and sample size $n$ (degrees of freedom $\nu = n - 2$).

Alternatively, use the test statistic: $t = r\sqrt{\dfrac{n-2}{1-r^2}} \sim t_{n-2}$ under $H_0$.

Example 11.5.1 — Computing PMCC

For the data $\sum x = 30$, $\sum y = 40$, $\sum x^2 = 200$, $\sum y^2 = 360$, $\sum xy = 260$, $n = 5$, find $r$.

$S_{xx} = 200 - \tfrac{30^2}{5} = 200 - 180 = 20$.

$S_{yy} = 360 - \tfrac{40^2}{5} = 360 - 320 = 40$.

$S_{xy} = 260 - \tfrac{30 \times 40}{5} = 260 - 240 = 20$.

$r = \dfrac{20}{\sqrt{20 \times 40}} = \dfrac{20}{\sqrt{800}} = \dfrac{20}{20\sqrt{2}} = \dfrac{1}{\sqrt{2}} \approx 0.707$.

Example 11.5.2 — Regression Line and Prediction

Using the data in Example 11.5.1, find the regression line of $y$ on $x$.

$b = \dfrac{S_{xy}}{S_{xx}} = \dfrac{20}{20} = 1$. $\quad a = \bar{y} - b\bar{x} = 8 - 1 \times 6 = 2$.

Regression line: $\hat{y} = 2 + x$. For $x = 7$: $\hat{y} = 9$ (interpolation — reliable).

Example 11.5.3 — Non-Linear Regression Using Coding

Data suggest $y \approx ab^x$. Taking logarithms: $\ln y = \ln a + x \ln b$. Let $Y = \ln y$; then $Y = A + Bx$ is a linear model. Compute the regression of $Y$ on $x$, then $a = e^A$, $b = e^B$.

Example 11.5.4 — Hypothesis Test for Correlation

For $n = 10$ and computed $r = 0.65$, test at the 5% level whether there is positive linear correlation ($H_1: \rho > 0$).

Using the $t$-test approach: $t = r\sqrt{\dfrac{n-2}{1-r^2}} = 0.65\sqrt{\dfrac{8}{1-0.4225}} = 0.65\sqrt{\dfrac{8}{0.5775}} = 0.65 \times 3.722 = 2.419$.

$\nu = n - 2 = 8$. One-tailed $t_{8, 0.05} = 1.860$. Since $2.419 > 1.860$, reject $H_0$ — there is evidence of positive linear correlation at the 5% level.

Alternatively, compare $r = 0.65$ with the critical value from the PMCC table for $n = 10$ at 5% (one-tailed), which is $0.5494$. Since $0.65 > 0.5494$, the same conclusion follows.

Exam Tip

The regression line of $y$ on $x$ minimises the sum of squared residuals in the $y$-direction. It is used to predict $y$ given $x$. Do not use the regression line of $y$ on $x$ to predict $x$ from $y$ — use the regression line of $x$ on $y$ for that purpose.

11.6 Further Hypothesis Testing: the $t$-Distribution

When the population variance is unknown and must be estimated from a small sample, the test statistic follows a $t$-distribution rather than the standard normal $Z$. The $t$-distribution has heavier tails than the normal and depends on the degrees of freedom $\nu$.

One-Sample $t$-Test

Testing $H_0: \mu = \mu_0$ with unknown $\sigma^2$, using sample mean $\bar{x}$, sample variance $s^2$, sample size $n$:

$$t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}, \quad \text{where } s^2 = \frac{1}{n-1}\sum(x_i - \bar{x})^2$$

Under $H_0$, $t \sim t_{n-1}$ (Student's $t$-distribution with $\nu = n-1$ degrees of freedom).

Two-Sample and Paired $t$-Tests

Paired $t$-test: Compute differences $d_i = x_i - y_i$, then apply the one-sample $t$-test to $\{d_i\}$ with $H_0: \mu_d = 0$.

Independent two-sample $t$-test (equal variances): Use pooled variance $s_p^2 = \dfrac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}$ and test statistic $t = \dfrac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{1/n_1 + 1/n_2}}$ with $\nu = n_1 + n_2 - 2$.

Example 11.6.1 — One-Sample $t$-Test

A sample of 10 readings gives $\bar{x} = 52.3$ and $s = 4.1$. Test $H_0: \mu = 50$ against $H_1: \mu > 50$ at the 5% level.

$t = \dfrac{52.3 - 50}{4.1/\sqrt{10}} = \dfrac{2.3}{1.297} \approx 1.773$.

$\nu = 9$. One-tailed critical value at 5%: $t_{9, 0.05} = 1.833$. Since $1.773 < 1.833$, do not reject $H_0$ at the 5% level.

Example 11.6.2 — Paired $t$-Test

Six students sat a test before and after a revision course. Differences (after $-$ before) are: $3, 5, -1, 4, 2, 3$. Test at 5% whether the course improved performance.

$\bar{d} = \tfrac{16}{6} = 2.667$. $\sum d_i^2 = 9 + 25 + 1 + 16 + 4 + 9 = 64$.
$s_d^2 = \dfrac{64 - 6 \times 2.667^2}{5} = \dfrac{64 - 42.667}{5} = \dfrac{21.333}{5} = 4.267$, $s_d = 2.066$.
$t = \dfrac{2.667}{2.066/\sqrt{6}} = \dfrac{2.667}{0.843} \approx 3.163$.

$\nu = 5$. $t_{5, 0.05}^{\text{one-tail}} = 2.015$. Since $3.163 > 2.015$, reject $H_0$ — there is evidence the course improved performance.

Example 11.6.3 — $t$-Test vs $Z$-Test

Key comparison:

Example 11.6.4 — Independent Two-Sample $t$-Test

Two independent groups of students took a test. Group 1 ($n_1 = 8$): $\bar{x}_1 = 71$, $s_1 = 6$. Group 2 ($n_2 = 10$): $\bar{x}_2 = 65$, $s_2 = 8$. Assuming equal population variances, test at 5% whether the group means differ ($H_1: \mu_1 \neq \mu_2$).

Pooled variance: $s_p^2 = \dfrac{7(36) + 9(64)}{16} = \dfrac{252 + 576}{16} = \dfrac{828}{16} = 51.75$, $s_p = 7.194$.

$t = \dfrac{71 - 65}{7.194\sqrt{1/8 + 1/10}} = \dfrac{6}{7.194 \times 0.474} = \dfrac{6}{3.410} \approx 1.759$.

$\nu = 16$. Two-tailed $t_{16, 0.025} = 2.120$. Since $|t| = 1.759 < 2.120$, do not reject $H_0$ — insufficient evidence of a difference at 5%.

Summary — Choosing the Right Test

Exam Tip

Always use the unbiased estimate $s^2 = \dfrac{1}{n-1}\sum(x_i - \bar{x})^2$ (dividing by $n-1$, not $n$) in $t$-tests. Dividing by $n$ gives the biased sample variance; the $t$-test requires the unbiased version.

Practice Problems

Problem 1

$X \sim \text{Po}(2.5)$. Find $P(X = 3)$ and $P(X \geq 2)$.

Show solution

$P(X = 3) = \dfrac{e^{-2.5}(2.5)^3}{6} = \dfrac{15.625e^{-2.5}}{6} \approx 0.2138$.

$P(X \geq 2) = 1 - P(X=0) - P(X=1) = 1 - e^{-2.5}(1 + 2.5) = 1 - 3.5e^{-2.5} \approx 1 - 0.2873 = 0.7127$.

Problem 2

$X \sim \text{Po}(4)$ and $Y \sim \text{Po}(6)$ independently. Find $P(X + Y = 10)$.

Show solution

$X + Y \sim \text{Po}(10)$. $P(X+Y=10) = \dfrac{e^{-10}10^{10}}{10!} = \dfrac{10^{10}e^{-10}}{3628800} \approx 0.1251$.

Problem 3

A continuous random variable has PDF $f(x) = \tfrac{3}{8}x^2$ for $0 \leq x \leq 2$. Find $\mathrm{E}(X)$ and $\mathrm{Var}(X)$.

Show solution

$\mathrm{E}(X) = \int_0^2 \tfrac{3}{8}x^3\,dx = \tfrac{3}{8}\left[\tfrac{x^4}{4}\right]_0^2 = \tfrac{3}{8} \times 4 = \tfrac{3}{2}$.

$\mathrm{E}(X^2) = \int_0^2 \tfrac{3}{8}x^4\,dx = \tfrac{3}{8}\left[\tfrac{x^5}{5}\right]_0^2 = \tfrac{3}{8} \times \tfrac{32}{5} = \tfrac{12}{5}$.

$\mathrm{Var}(X) = \tfrac{12}{5} - \left(\tfrac{3}{2}\right)^2 = \tfrac{12}{5} - \tfrac{9}{4} = \tfrac{48-45}{20} = \tfrac{3}{20}$.

Problem 4

$X \sim \text{Exp}(2)$. Find (a) $P(X > 1)$, (b) $P(1 < X < 2)$, and (c) the median of $X$.

Show solution

(a) $P(X > 1) = e^{-2} \approx 0.1353$.

(b) $P(1 < X < 2) = e^{-2} - e^{-4} \approx 0.1353 - 0.0183 = 0.1170$.

(c) Median $m$: $F(m) = 1 - e^{-2m} = \tfrac{1}{2} \Rightarrow e^{-2m} = \tfrac{1}{2} \Rightarrow m = \tfrac{\ln 2}{2} \approx 0.347$.

Problem 5

A fair six-sided die is rolled 60 times. The frequencies are: 1:8, 2:11, 3:9, 4:14, 5:7, 6:11. Carry out a $\chi^2$ goodness-of-fit test at the 10% significance level.

Show solution

Expected frequency each: $60/6 = 10$. $H_0$: die is fair.

$\chi^2 = \dfrac{4}{10}+\dfrac{1}{10}+\dfrac{1}{10}+\dfrac{16}{10}+\dfrac{9}{10}+\dfrac{1}{10} = \dfrac{32}{10} = 3.2$.

$\nu = 5$. $\chi^2_{5, 0.10} = 9.236$. Since $3.2 < 9.236$, do not reject $H_0$ — no evidence the die is unfair.

Problem 6

From a sample of size $n = 8$: $\bar{x} = 24.5$, $s = 3.2$. Test $H_0: \mu = 22$ against $H_1: \mu \neq 22$ at the 5% significance level.

Show solution

$t = \dfrac{24.5 - 22}{3.2/\sqrt{8}} = \dfrac{2.5}{1.131} \approx 2.210$.

$\nu = 7$. Two-tailed 5% critical value: $t_{7, 0.025} = 2.365$. Since $|t| = 2.210 < 2.365$, do not reject $H_0$ at 5%.

Problem 7

Compute the PMCC for the data: $(1,2), (2,4), (3,5), (4,4), (5,7)$.

Show solution

$n = 5$, $\sum x = 15$, $\sum y = 22$, $\sum x^2 = 55$, $\sum y^2 = 110$, $\sum xy = 72$.

$S_{xx} = 55 - 45 = 10$, $S_{yy} = 110 - 96.8 = 13.2$, $S_{xy} = 72 - 66 = 6$.

$r = \dfrac{6}{\sqrt{10 \times 13.2}} = \dfrac{6}{\sqrt{132}} \approx \dfrac{6}{11.49} \approx 0.522$.

Problem 8

Find the regression line of $y$ on $x$ for the data in Problem 7, and predict $y$ when $x = 6$.

Show solution

$b = S_{xy}/S_{xx} = 6/10 = 0.6$. $\bar{x} = 3$, $\bar{y} = 4.4$.

$a = 4.4 - 0.6 \times 3 = 2.6$. Regression line: $\hat{y} = 2.6 + 0.6x$.

For $x = 6$: $\hat{y} = 2.6 + 3.6 = 6.2$. (Extrapolation — treat with caution.)

Problem 9

Pairs of observations give differences $d$: $2, -1, 3, 0, 4, 1, 2, -1$. Use a paired $t$-test to test $H_0: \mu_d = 0$ against $H_1: \mu_d > 0$ at 5%.

Show solution

$n = 8$, $\sum d = 10$, $\bar{d} = 1.25$. $\sum d^2 = 4+1+9+0+16+1+4+1 = 36$.

$s_d^2 = \dfrac{36 - 8(1.25)^2}{7} = \dfrac{36 - 12.5}{7} = \dfrac{23.5}{7} = 3.357$. $s_d = 1.833$.

$t = \dfrac{1.25}{1.833/\sqrt{8}} = \dfrac{1.25}{0.648} \approx 1.929$. $\nu = 7$, $t_{7,0.05}^{\text{one-tail}} = 1.895$.

Since $1.929 > 1.895$, reject $H_0$ — evidence that $\mu_d > 0$.

Problem 10

A $3 \times 2$ contingency table has observed frequencies: $(10, 20), (15, 15), (5, 15)$. The row totals are 30, 30, 20, and the column totals are 30 and 50. Carry out a $\chi^2$ test of independence at 5%.

Show solution

Grand total: 80. Expected frequencies: $E_{ij} = \dfrac{R_i \times C_j}{80}$.

$E_{11} = 30\times30/80 = 11.25$, $E_{12} = 18.75$; $E_{21} = 11.25$, $E_{22} = 18.75$; $E_{31} = 7.5$, $E_{32} = 12.5$.

$\chi^2 = \dfrac{1.5625}{11.25}+\dfrac{1.5625}{18.75}+\dfrac{(3.75)^2}{11.25}+\dfrac{(3.75)^2}{18.75}+\dfrac{(2.5)^2}{7.5}+\dfrac{(2.5)^2}{12.5}$

$= 0.139 + 0.083 + 1.25 + 0.75 + 0.833 + 0.5 = 3.556$.

$\nu = (3-1)(2-1) = 2$. $\chi^2_{2,0.05} = 5.991$. Since $3.556 < 5.991$, do not reject $H_0$ — no evidence of dependence.