Chapter 8: Data, Statistics & Probability
Statistics lets us make sense of the world by organizing, displaying, and analyzing data. Probability gives us a mathematical framework for measuring how likely events are to occur. Together, these ideas appear in science, sports, business, medicine, and everyday decision-making. In this final pre-algebra chapter, you will learn to compute key statistics, read and create data displays, and calculate probabilities for simple and compound events.
8.1 Measures of Central Tendency
A measure of central tendency is a single value that summarizes a data set by identifying its "center."
Mean, Median, and Mode
- Mean (average): Add all values and divide by the count. $$\bar{x} = \frac{\text{sum of all values}}{\text{number of values}}$$
- Median: The middle value when data are arranged in order. If there is an even number of values, the median is the average of the two middle values.
- Mode: The value(s) that appear most often. A data set may have no mode, one mode, or multiple modes.
Example 1 — Mean, Median, Mode, and Range
Quiz scores for 9 students: $\{72, 85, 90, 72, 88, 95, 78, 85, 82\}$
Which Measure to Use?
- Mean: Best for symmetric data without outliers.
- Median: Best when data are skewed or have outliers (e.g., income data).
- Mode: Best for categorical data or when identifying the most common value.
8.2 Measures of Spread
Measures of spread describe how spread out or tightly clustered a data set is.
Range, IQR, and MAD
- Range: $\text{Range} = \text{Maximum} - \text{Minimum}$
- Interquartile Range (IQR): The range of the middle 50% of data. $$\text{IQR} = Q_3 - Q_1$$ where $Q_1$ = median of the lower half and $Q_3$ = median of the upper half.
- Mean Absolute Deviation (MAD): The average distance each data point is from the mean. $$\text{MAD} = \frac{\sum |x_i - \bar{x}|}{n}$$
Example 2 — Finding Quartiles and IQR
Data set (already ordered): $\{4, 7, 9, 11, 14, 16, 18, 21, 25\}$
8.3 Data Displays
Frequency Tables
A frequency table tallies how often each value (or range of values) appears.
| Test Score Range | Tally | Frequency | Relative Frequency |
|---|---|---|---|
| 60 – 69 | II | 2 | 10% |
| 70 – 79 | IIII | 4 | 20% |
| 80 – 89 | IIII III | 8 | 40% |
| 90 – 99 | IIII I | 6 | 30% |
| Total | 20 | 100% |
Bar Graphs vs. Histograms
| Feature | Bar Graph | Histogram |
|---|---|---|
| Data type | Categorical (e.g., favorite colors) | Numerical, continuous (e.g., heights) |
| Bars | Separated by gaps | Bars touch (no gaps) |
| X-axis | Categories | Intervals / ranges of values |
| Use case | Comparing categories | Showing distribution of numerical data |
Box Plots (Box-and-Whisker Plots)
A box plot displays the five-number summary of a data set: minimum, $Q_1$, median ($Q_2$), $Q_3$, and maximum.
For the data in Example 2, the five-number summary is:
| Min | $Q_1$ | Median ($Q_2$) | $Q_3$ | Max |
|---|---|---|---|---|
| 4 | 8 | 14 | 19.5 | 25 |
Scatter Plots and Correlation
A scatter plot displays two numerical variables as ordered pairs on a coordinate plane. We look for a correlation between them:
- Positive correlation: As $x$ increases, $y$ tends to increase. (Points slope upward ↗)
- Negative correlation: As $x$ increases, $y$ tends to decrease. (Points slope downward ↘)
- No correlation: No clear pattern; points are scattered randomly.
A line of best fit (trend line) is drawn through the middle of the data to model the relationship and make predictions.
Example 3 — Interpreting a Scatter Plot
A scatter plot shows hours of TV watched per day ($x$) vs. test score ($y$). As hours of TV increase, scores decrease. This is a negative correlation. A student who watches 4 hours of TV might be predicted to score around 65 based on the trend line.
8.4 Introduction to Probability
Basic Probability Vocabulary
- Experiment: A procedure with a well-defined set of outcomes (e.g., rolling a die).
- Sample Space ($S$): The set of all possible outcomes. Rolling a die: $S = \{1, 2, 3, 4, 5, 6\}$.
- Event: A subset of the sample space. Example: rolling an even number: $E = \{2, 4, 6\}$.
- Probability of Event $E$: $$P(E) = \frac{\text{number of favorable outcomes}}{\text{total number of outcomes}}$$
Probability always satisfies $0 \leq P(E) \leq 1$. An impossible event has probability $0$; a certain event has probability $1$.
Theoretical vs. Experimental Probability
- Theoretical probability is calculated from known outcomes without performing an experiment. Example: $P(\text{heads}) = \tfrac{1}{2}$.
- Experimental probability is based on actual trial results: $$P_{\text{exp}}(E) = \frac{\text{number of times } E \text{ occurred}}{\text{total number of trials}}$$ As trials increase, experimental probability approaches theoretical probability (Law of Large Numbers).
Example 4 — Simple Probability
A bag contains 3 red, 5 blue, and 2 green marbles. One marble is drawn at random. Find:
- $P(\text{red}) = \dfrac{3}{10} = 0.3 = 30\%$
- $P(\text{not green}) = \dfrac{8}{10} = \dfrac{4}{5} = 80\%$
- $P(\text{yellow}) = \dfrac{0}{10} = 0$ (impossible)
8.5 Complementary and Compound Events
Complement Rule
The complement of event $E$ (written $E'$ or $\bar{E}$) consists of all outcomes not in $E$.
$$P(E') = 1 - P(E)$$Independent vs. Dependent Events
- Independent events: The outcome of one does not affect the other. Example: flipping a coin twice.
- Dependent events: The outcome of one does affect the other. Example: drawing two cards without replacement.
Multiplication Rule for Independent Events:
$$P(A \text{ and } B) = P(A) \times P(B)$$Addition Rule for Mutually Exclusive Events (events that cannot both occur):
$$P(A \text{ or } B) = P(A) + P(B)$$Example 5 — Compound Events (Independent)
A fair coin is flipped and a fair die is rolled. Find the probability of getting heads and rolling a 4.
$$P(\text{heads}) = \frac{1}{2}, \quad P(\text{rolling a 4}) = \frac{1}{6}$$ $$P(\text{heads and 4}) = \frac{1}{2} \times \frac{1}{6} = \frac{1}{12}$$8.6 Tree Diagrams and the Counting Principle
Fundamental Counting Principle
If one event can occur in $m$ ways and a second event can occur in $n$ ways, then the two events together can occur in $m \times n$ ways. This extends to any number of events:
$$\text{Total outcomes} = n_1 \times n_2 \times n_3 \times \cdots$$Example 6 — Tree Diagram: Coin Flipped Twice
List all possible outcomes of flipping a fair coin twice and find the probability of getting exactly one head.
Sample space: $S = \{HH, HT, TH, TT\}$ — 4 equally likely outcomes.
Exactly one head: $\{HT, TH\}$ → 2 outcomes.
$$P(\text{exactly one head}) = \frac{2}{4} = \frac{1}{2}$$Example 7 — Counting Principle
A lunch menu offers 3 sandwiches, 4 sides, and 2 drinks. How many different lunch combinations are possible?
$$3 \times 4 \times 2 = 24 \text{ combinations}$$Example 8 — Probability with a Frequency Table
A survey of 50 students records their favorite subject:
| Subject | Math | Science | English | History |
|---|---|---|---|---|
| Students | 18 | 12 | 13 | 7 |
A student is chosen at random. Find $P(\text{Math or Science})$.
$$P(\text{Math}) = \frac{18}{50}, \quad P(\text{Science}) = \frac{12}{50}$$ $$P(\text{Math or Science}) = \frac{18}{50} + \frac{12}{50} = \frac{30}{50} = \frac{3}{5} = 60\%$$Practice Problems
Practice — Chapter 8
- Find the mean, median, mode, and range of: $\{14, 22, 9, 14, 31, 18, 22, 14\}$.
- A data set has values $\{3, 7, 8, 12, 15, 17, 21, 24\}$. Find $Q_1$, $Q_3$, and the IQR.
- Describe the difference between a bar graph and a histogram.
- A scatter plot of study hours vs. exam score shows a positive correlation. What does this mean in context?
- A card is drawn at random from a standard 52-card deck. Find $P(\text{drawing an ace})$.
- A bag has 4 red and 6 blue chips. Find $P(\text{not red})$.
- Two fair dice are rolled. How many outcomes are in the sample space? Find $P(\text{sum} = 7)$.
- A restaurant offers 4 entrees, 3 salads, and 5 desserts. How many three-course meals are possible?
- A coin is flipped 3 times. Draw a tree diagram and find $P(\text{exactly 2 tails})$.
- In 200 spins of a spinner, red appeared 68 times. What is the experimental probability of red? If the theoretical probability is $\frac{1}{3}$, compare the two values.
Show Answers
- Ordered: $9,14,14,14,18,22,22,31$. Mean $= \frac{144}{8} = 18$; Median $= \frac{14+18}{2} = 16$; Mode $= 14$; Range $= 22$.
- Lower half: $\{3,7,8,12\}$ → $Q_1 = \frac{7+8}{2} = 7.5$; Upper half: $\{15,17,21,24\}$ → $Q_3 = \frac{17+21}{2} = 19$; IQR $= 11.5$.
- Bar graphs show categorical data with gaps between bars; histograms show continuous numerical data with no gaps.
- Students who study more hours tend to score higher on the exam.
- $P(\text{ace}) = \frac{4}{52} = \frac{1}{13} \approx 7.7\%$.
- $P(\text{not red}) = \frac{6}{10} = \frac{3}{5} = 60\%$.
- $6 \times 6 = 36$ outcomes. Pairs summing to 7: $(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)$ → $P = \frac{6}{36} = \frac{1}{6}$.
- $4 \times 3 \times 5 = 60$ meals.
- 8 outcomes: HHH,HHT,HTH,HTT,THH,THT,TTH,TTT. Exactly 2 tails: HTT,THT,TTH → $P = \frac{3}{8}$.
- $P_{\text{exp}}(\text{red}) = \frac{68}{200} = 0.34 = 34\%$. Theoretical: $\frac{1}{3} \approx 33.3\%$. The values are very close, consistent with the Law of Large Numbers.
Chapter Summary
- Mean = sum ÷ count; Median = middle value (ordered); Mode = most frequent; Range = max − min.
- IQR $= Q_3 - Q_1$ measures the spread of the middle 50%; MAD measures average distance from the mean.
- Data displays: frequency tables, bar graphs (categorical), histograms (numerical), box plots (5-number summary), scatter plots (two variables).
- Correlation describes a relationship between two variables: positive, negative, or none.
- Probability: $P(E) = \dfrac{\text{favorable}}{\text{total}}$; always between 0 and 1.
- Complement: $P(E') = 1 - P(E)$.
- Independent compound events: $P(A \text{ and } B) = P(A) \times P(B)$.
- Tree diagrams list all outcomes systematically; the Counting Principle multiplies choices for total outcomes.
- Experimental probability approaches theoretical probability with more trials.