Chapter 1: Exploring Data — Distributions

AP Statistics • Chapter 1 • Updated March 2026 • ~45 min read

Learning Objectives

1.1 Types of Variables

Statistics begins with data — information collected about individuals. Before analyzing data, we must identify the type of variable being measured, since different variable types require different methods.

Definition: Types of Variables

A categorical variable (also called qualitative) places each individual into one of several groups or categories. Examples: eye color, gender, country of birth, AP exam score (A/B/C/D/F).

A quantitative variable takes numerical values for which arithmetic makes sense. Examples: height in cm, SAT score, temperature, number of siblings.

Quantitative variables can be further divided:

Example 1.1 — Identifying Variable Types

A survey of 30 AP Statistics students records the following information. Classify each variable.

VariableTypeReason
Favorite subjectCategoricalPlaces student in a category (Math, English, …)
Hours studied per weekQuantitative (continuous)Numerical, arithmetic makes sense
Number of AP exams takenQuantitative (discrete)Countable whole numbers
Grade in AP Stats (A/B/C)CategoricalLetter grades are categories, not numbers
TRY IT

A researcher records: (a) blood type of each patient, (b) systolic blood pressure, (c) number of hospitalizations. Classify each variable.

Show Answer
(a) Blood type: Categorical — types A, B, AB, O are categories
(b) Systolic blood pressure: Quantitative (continuous) — a measurement that can be any positive number
(c) Number of hospitalizations: Quantitative (discrete) — a countable whole number (0, 1, 2, …)

1.2 Displaying Distributions with Graphs

To understand a dataset, we start by making a graph. The graph reveals the distribution of a variable — what values occur and how often.

Dotplots

A dotplot places each data value as a dot above a number line. Dotplots work well for small datasets and show individual values clearly.

Example 1.2 — Reading a Dotplot

The number of text messages sent by 12 students in one hour: 3, 5, 5, 7, 8, 8, 8, 10, 12, 12, 15, 20

Each value gets one dot. Stacked dots indicate repeated values. We can see immediately that most students sent 5–12 messages, with one outlier at 20.

Histograms

A histogram divides the range of data into equal-width intervals (called bins) and displays the count or percent of observations in each bin. Histograms work well for large datasets.

How to Construct a Histogram

  1. Choose a convenient number of bins (typically 5–10)
  2. Make the bins equal in width, covering the full range
  3. Count the observations in each bin
  4. Draw bars of height = frequency (or relative frequency); bars touch each other

Interactive: Adjust the slider to change bin width and observe how the histogram shape changes.

Figure 1.1 — Histogram of Test Scores (n = 30)

Describing Shape

When you look at a histogram (or any distribution graph), describe its shape:

Distribution Shapes

Three distribution shapes: symmetric (blue), right-skewed (red), left-skewed (green)

Figure 1.2 — Symmetric vs. Skewed Distributions

AP Exam Tip: When describing a distribution, always address Shape, Center, Spread, and any Outliers (SCSO or "SOCS"). Free-response graders look for all four components.

1.3 Boxplots and the Five-Number Summary

A boxplot (box-and-whisker plot) summarizes a distribution using five key values called the five-number summary: Minimum, Q1, Median (Q2), Q3, Maximum.

Five-Number Summary

Given a dataset sorted in order:

The Interquartile Range (IQR) $= Q_3 - Q_1$ measures the spread of the middle 50% of data.

Example 1.3 — Computing the Five-Number Summary

AP exam scores for 15 students (sorted):
1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5

Step 1 — Median: The 8th value = 4

Step 2 — Q1: Lower half = {1, 2, 2, 3, 3, 3, 4}; median = 3

Step 3 — Q3: Upper half = {4, 4, 4, 5, 5, 5, 5}; median = 5

Five-number summary: Min = 1, Q1 = 3, Median = 4, Q3 = 5, Max = 5

IQR = Q3 − Q1 = 5 − 3 = 2

Identifying Outliers

The 1.5 × IQR Rule for Outliers

An observation is a suspected outlier if it falls:

On a modified boxplot, outliers are plotted as individual points; whiskers extend only to the last non-outlier value.

Interactive boxplot — the five-number summary is displayed. Outliers are shown as separate points.

Figure 1.3 — Modified Boxplot with Outlier Detection

TRY IT

A dataset has Q1 = 12, Q3 = 20. Calculate the IQR and the outlier fences.

Show Answer
IQR = Q3 − Q1 = 20 − 12 = 8
Lower fence: Q1 − 1.5(8) = 12 − 12 = 0
Upper fence: Q3 + 1.5(8) = 20 + 12 = 32
Any value below 0 or above 32 is a suspected outlier.

1.4 Comparing Distributions

A common AP Statistics task is to compare two or more distributions. Use side-by-side boxplots or back-to-back stemplots. Always compare shape, center, spread, and outliers in context.

Example 1.4 — Comparing Two Distributions

Two classes take the same quiz. Class A: min=52, Q1=68, median=74, Q3=82, max=96. Class B: min=60, Q1=72, median=80, Q3=85, max=92.

Center: Class B has a higher median (80 vs 74), suggesting Class B performed better on average.

Spread: Class A has a larger IQR (82−68=14) vs Class B (85−72=13), so Class A is slightly more variable.

Shape: Both distributions appear roughly symmetric based on the summary values.

Outliers: No outliers visible from the five-number summary.

AP Exam Tip: When comparing distributions, always write comparisons in context and use comparative language ("Class B's median is higher than Class A's median"). Simply listing each distribution's statistics without comparing earns partial credit only.

Practice Problems

Problem 1

A sample of 10 students recorded how many hours they sleep per night: 6, 7, 7, 8, 8, 8, 9, 9, 10, 12
(a) Find the five-number summary.
(b) Calculate the IQR.
(c) Identify any outliers using the 1.5 × IQR rule.

Show Solution
(a) Min = 6, Q1 = 7 (median of {6,7,7,8,8}), Median = 8, Q3 = 9 (median of {8,9,9,10,12}), Max = 12
Five-number summary: 6 | 7 | 8 | 9 | 12

(b) IQR = Q3 − Q1 = 9 − 7 = 2

(c) Lower fence = 7 − 1.5(2) = 4; Upper fence = 9 + 1.5(2) = 12
The value 12 equals the upper fence but is not strictly beyond it, so no outliers by the strict rule. (Note: some texts use "≥ fence" rather than "> fence" — clarify which your teacher uses.)
Problem 2

A histogram shows that the distribution of household incomes in a city is strongly skewed right.
(a) What does the skewed-right shape tell us about most households vs. a few households?
(b) Would you expect the mean income to be greater than or less than the median income? Explain.

Show Solution
(a) Skewed right means most households earn moderate incomes (clustered at the left), while a few households earn very high incomes that create a long right tail.

(b) The mean will be greater than the median. The few extremely high earners pull the mean toward the right tail, but the median (middle value) is not affected by extreme values. This is a classic pattern in income data.
Problem 3

Classify each variable as categorical or quantitative:
(a) ZIP code   (b) Annual rainfall in mm   (c) Shirt size (S/M/L/XL)   (d) Number of siblings

Show Solution
(a) ZIP code: Categorical — ZIP codes are labels (arithmetic like "average ZIP" is meaningless)
(b) Annual rainfall: Quantitative (continuous)
(c) Shirt size: Categorical — ordered categories, but not truly numerical
(d) Number of siblings: Quantitative (discrete)
Problem 4 — AP Free Response Style

Two competing tutoring programs (Program A and Program B) report the following SAT Math score gains for a sample of students:

Program A: Min=20, Q1=40, Median=60, Q3=80, Max=150
Program B: Min=30, Q1=50, Median=65, Q3=75, Max=100

Compare the distributions of score gains for the two programs. Write a complete response using the SOCS framework.

Show Solution
Shape: Both distributions are right-skewed — Program A more so, since its maximum (150) is much higher than Q3 (80), suggesting a long right tail. Program B appears more symmetric.

Center: Program B has a slightly higher median score gain (65 points) compared to Program A (60 points), suggesting Program B typically produces marginally larger gains.

Spread: Program A has greater variability: IQR = 80 − 40 = 40, compared to Program B's IQR = 75 − 50 = 25. Program A's range is also larger (130 vs. 70). Program A's results are more inconsistent.

Outliers: Program A's maximum of 150 is a likely outlier. Check: upper fence = 80 + 1.5(40) = 140; since 150 > 140, the value of 150 is a suspected outlier. No outliers are apparent in Program B.
Problem 5 — Multiple Choice Style

A distribution has Q1 = 45 and Q3 = 65. Which of the following values would be classified as an outlier?
(A) 20   (B) 35   (C) 70   (D) 80

Show Solution
IQR = 65 − 45 = 20
Lower fence = 45 − 1.5(20) = 45 − 30 = 15
Upper fence = 65 + 1.5(20) = 65 + 30 = 95

Values outside (15, 95) are outliers. Checking options:
(A) 20: between 15 and 95 → not an outlier
(B) 35: between 15 and 95 → not an outlier
(C) 70: between 15 and 95 → not an outlier
(D) 80: between 15 and 95 → not an outlier

None of the options given are outliers. Answer: None of the above (This tests whether students carefully apply the fence rule rather than guessing "the biggest number".)
Problem 6

A dataset of 20 values is described: the mean is 55 and the median is 42. What does this tell you about the shape of the distribution? Explain your reasoning.

Show Solution
Since the mean (55) is greater than the median (42), the distribution is likely skewed right (positively skewed). A few large values are pulling the mean above the median. The median is resistant to extreme values, so it remains lower when there is a long right tail. The difference of 13 between mean and median suggests a noticeable skew.
Problem 7

Identify an appropriate graph for each situation:
(a) Display the distribution of birth months (Jan–Dec) for 50 students
(b) Compare the heights of male and female students in a class of 60
(c) Show the distribution of 200 SAT scores

Show Solution
(a) Bar chart — birth month is categorical; a pie chart also works
(b) Side-by-side boxplots — best for comparing two groups on a quantitative variable; back-to-back stemplot also works for smaller datasets
(c) Histogram — 200 observations of a quantitative variable; individual values would be too crowded for a dotplot or stemplot
Problem 8 — Challenge

A dataset has the property that the mean, median, and mode are all equal.
(a) What shape does the distribution likely have?
(b) Give a specific example of such a dataset with 5 values.

Show Solution
(a) When mean = median = mode, the distribution is likely symmetric and unimodal (bell-shaped). In a perfectly symmetric distribution, all three measures of center coincide at the axis of symmetry.

(b) Example: 2, 4, 4, 4, 6
Mean = (2+4+4+4+6)/5 = 20/5 = 4 ✓
Median = middle value = 4 ✓
Mode = most frequent = 4 ✓

📋 Chapter Summary

Types of Data

Categorical Variable

Records which group or category an individual belongs to. Examples: gender, color, region. Summarized with frequency tables and bar charts.

Quantitative Variable

Records numerical values where arithmetic makes sense. Examples: height, temperature, income. Summarized with histograms, dotplots, boxplots.

Distribution

Describes the pattern of values: shape (symmetric, skewed, bimodal), center (mean/median), spread (range/IQR/SD), and outliers.

Comparing Distributions

Use parallel boxplots or back-to-back stemplots. Compare shape, center, spread, and outliers in context. Always use comparative language.

Graph Types

Histogram

Groups quantitative data into intervals (bins). Shows shape clearly. Use for large datasets.

Boxplot

Shows the five-number summary: min, Q1, median, Q3, max. Outliers plotted individually beyond 1.5×IQR from Q1/Q3.

Dotplot / Stemplot

Shows every data value. Useful for small datasets to see exact values and identify gaps or clusters.

Bar Chart

For categorical data. Bars represent frequency or relative frequency for each category. Bars should NOT touch.

Shape Descriptions

  1. Symmetric — roughly mirror-image on both sides of center
  2. Skewed right — tail extends to the right; mean > median
  3. Skewed left — tail extends to the left; mean < median
  4. Unimodal / Bimodal — one or two distinct peaks

📘 Key Terms

IndividualA person or object described by data.
VariableA characteristic that takes different values for different individuals.
Categorical VariablePlaces individuals into groups or categories; values are labels, not numbers.
Quantitative VariableTakes numerical values with a meaningful scale; arithmetic makes sense.
DistributionPattern of values in data — shape, center, spread, and any outliers.
SkewnessAsymmetry in a distribution. Right-skewed: long tail right; left-skewed: long tail left.
← Previous AP Statistics Contents
Next Chapter → Chapter 2: Describing Distributions