Chapter 6: Designing Studies

AP Statistics · Data Collection & Study Design · 3 interactive graphs · 8 practice problems

Learning Objectives

Distinguish between population, sample, and census; identify the sampling frame
Describe and compare probability sampling methods: SRS, stratified, cluster, and systematic
Identify and explain sources of bias in surveys: undercoverage, non-response, and response bias
Distinguish between observational studies and experiments
Apply the four principles of experimental design: control, randomization, replication, and blocking
Describe completely randomized designs and randomized block designs including matched pairs

6.1 Population, Sample, and Sampling Methods

When we want to learn about a group, we rarely have the resources to study every individual. Instead, we select a sample from the larger population and use what we learn from the sample to draw conclusions about the population.

Definition: Key Terms

Population: The entire group of individuals we want information about.
Sample: A subset of individuals selected from the population to represent it.
Census: An attempt to collect data from every individual in the population.
Sampling frame: The list of individuals from which the sample is actually drawn. A poor sampling frame causes undercoverage bias.
Simple Random Sample (SRS): A sample selected so that every group of $n$ individuals from the population has an equal chance of being chosen. Each individual also has an equal chance of selection.

Probability Sampling Methods

A probability sample uses a chance mechanism to select individuals, so every member of the population has a known probability of being chosen. The most important probability samples are:

Simple Random Sample (SRS): Assign each individual a number; use a random number generator or table to select $n$ individuals. Every set of $n$ individuals is equally likely to be chosen.
Stratified Random Sample: Divide the population into non-overlapping groups called strata (based on a shared characteristic such as grade level or gender), then take an SRS from each stratum. Ensures representation from each subgroup.
Cluster Sample: Divide the population into groups called clusters (often geographic or naturally occurring), randomly select some clusters, then survey all individuals in the selected clusters. More practical when the population is spread over a large area.
Systematic Sample: Randomly choose a starting point, then select every $k$th individual from a list. For example, if you want a sample of 50 from 500, you select every 10th name after a random start.

Non-Probability Sampling (Avoid These)

Voluntary response samples allow individuals to choose whether to participate (e.g., online polls, call-in surveys). These are biased because people with strong opinions — often negative — are more likely to respond, so the sample does not represent the population.

Convenience samples select individuals who are easy to reach (e.g., surveying people in the hallway). These are biased because accessible individuals may differ systematically from the broader population.

Example 6.1 — Comparing SRS and Voluntary Response

A school wants to survey 50 of its 500 students about cafeteria food quality.

SRS approach: Assign each student a number from 001 to 500. Use a random number table or calculator to select 50 distinct numbers. Every student has an equal $\frac{50}{500} = 10\%$ chance of selection, and every group of 50 has an equal chance of being chosen.

Voluntary response approach: Post a sign-up sheet in the cafeteria. Students who feel strongly (likely those who dislike the food) are more likely to sign up. The sample will overrepresent dissatisfied students and underrepresent satisfied or neutral students — producing a biased, unrepresentative result.

Conclusion: The SRS produces an unbiased sample; the voluntary response sample is likely biased toward negative opinions.

TRY IT

A city wants to know residents' opinion on building a new park. The city has five distinct neighborhoods of roughly equal size. Describe how to take a stratified random sample by neighborhood.

Show Answer

Step 1: Define the strata — the five neighborhoods are the five strata.
Step 2: Determine sample size per stratum. If you want 100 total respondents, plan to select 20 from each neighborhood.
Step 3: Within each neighborhood, obtain a list of residents (the sampling frame for that stratum) and use an SRS to select 20 residents.
Advantage: This guarantees proportional representation from all five neighborhoods, which a single SRS from the whole city might miss by chance.

Population (gray) vs. sample (green) — the highlighted points represent the 50 selected from a population of 500.

Figure 6.1 — Population and Sample Visualization

6.2 Sources of Bias in Surveys

A biased sample or survey design systematically favors certain outcomes over others. Bias means the results will consistently deviate from the truth in a predictable direction. Increasing sample size does not fix bias — it only makes the biased result more precisely wrong.

Types of Bias in Surveys

Undercoverage bias: Some groups of the population have a lower probability of being included in the sample than others. Example: a telephone survey that only calls landlines will underrepresent younger adults who use only cell phones.
Non-response bias: Individuals who are selected for the sample cannot be contacted or choose not to participate. If non-respondents differ systematically from respondents, the results are biased.
Response bias (voluntary response or wording bias): Respondents give inaccurate or dishonest answers. Causes include: social desirability (answering in a way that seems "acceptable"), leading question wording, question order effects, or the presence of an interviewer.

Example 6.2 — Identifying Bias in a Survey Question

An online poll asks: "Do you agree that our city is failing its children by not funding playgrounds?" The results show 79% agree.

Bias identified: This is a classic example of response bias due to question wording (a leading question). The phrase "failing its children" is emotionally charged and pushes respondents toward agreement. A neutral question such as "Should the city increase funding for playgrounds?" would produce very different results.

Additionally, since this is an online poll with self-selection, it also suffers from voluntary response bias — people with strong feelings (those who strongly agree or disagree) are more likely to participate than those with moderate views.

TRY IT

A survey is mailed to 1,000 randomly selected households; only 120 respond. What type of bias is most concerning, and why?

Show Answer

Non-response bias is most concerning. Only 12% of selected households responded. If the 880 non-respondents differ systematically from the 120 respondents — for example, if homeowners respond more than renters, or if busier households are less likely to reply — then the 120 responses do not represent the full population of 1,000 selected households (or the broader population). The large non-response rate means we cannot trust that the sample is representative.

Survey response visualization: 1,000 households were selected, but only 120 responded — a 12% response rate.

Figure 6.2 — Non-Response Rate: 1,000 Selected, 120 Responded

6.3 Principles of Experimental Design

An observational study observes individuals and measures variables without attempting to influence the responses. We can find associations, but we cannot establish causation. An experiment deliberately imposes a treatment on individuals in order to observe their responses — experiments can establish causation when properly designed.

Definition: Experiment vs. Observational Study

Observational study: The researcher observes and records data without intervening. Can reveal associations but cannot prove causation (lurking variables may be responsible).
Experiment: The researcher imposes one or more treatments on experimental units and measures the response. When randomization is used, experiments can establish cause and effect.
Explanatory variable (factor): The variable whose effect on the response variable is being studied. Different values of the factor are called levels.
Response variable: The outcome that is measured after applying the treatment.
Treatment: A specific condition applied to experimental units (a combination of factor levels).
Experimental units: The individuals (people, animals, plots, etc.) to which treatments are applied.

The Four Principles of Experimental Design

Control: Keep all variables that might affect the response constant across treatment groups — except for the treatment itself. This includes using a control group (a group that receives no treatment or a standard/placebo treatment) for comparison.
Randomization: Randomly assign experimental units to treatment groups. Randomization balances out the effects of lurking variables (known and unknown) across groups, making the groups roughly equivalent at the start of the experiment.
Replication: Apply each treatment to enough experimental units to reduce the effect of chance variation. More replication produces more reliable estimates of treatment effects.
Blocking: Group similar experimental units into blocks before randomizing. Within each block, randomly assign treatments. This reduces variability and increases the ability to detect real treatment differences.

Placebo and Double-Blind Experiments

A placebo is a fake treatment (such as a sugar pill) that looks identical to the real treatment. It is used to account for the placebo effect — the tendency for people to respond positively simply because they believe they are being treated.

In a blind experiment, subjects do not know which treatment they received. In a double-blind experiment, neither the subjects nor the researchers who interact with them know which treatment was assigned. Double-blind experiments prevent both the placebo effect and researcher bias from influencing the results.

Example 6.3 — Designing an Experiment

Design an experiment to test whether listening to music improves math test scores.

Factor (explanatory variable): Listening condition — music vs. no music (silence)
Levels: Two levels — (1) classical music, (2) silence (control)
Treatments: Music group listens to classical music during the test; control group completes the test in silence
Experimental units: Students in the study
Response variable: Math test score
Control group: Students who take the test in silence
Randomization: Randomly assign students to the music or silence condition
Replication: Use a large enough sample (e.g., at least 30 per group) so that random differences in student ability average out

TRY IT

What is the purpose of a placebo in a medical experiment?

Show Answer

A placebo controls for the placebo effect — the tendency of subjects to feel or perform better simply because they believe they have received a treatment, regardless of its actual effect. By giving the control group an identical-appearing fake treatment, researchers ensure that any difference in outcomes between the treatment and control groups is due to the actual drug, not to psychological expectations. Without a placebo, we cannot separate the drug's true pharmacological effect from the psychological benefit of being treated.

★

AP Exam Tip — Confounding vs. Lurking Variables: A lurking variable is associated with both the explanatory and response variables but is not part of the study (common in observational studies). A confounding variable is a variable in an experiment whose effect on the response cannot be separated from the effect of the explanatory variable. Randomization in experiments controls for both — it is the key reason why well-designed experiments can establish causation while observational studies cannot.

6.4 Completely Randomized and Block Designs

There are two major experimental designs tested on the AP exam: the completely randomized design and the randomized block design.

Definition: Experimental Design Types

Completely Randomized Design (CRD): All experimental units are randomly assigned to treatments with no prior grouping. The simplest experimental design. Works best when subjects are relatively homogeneous (similar to each other).
Randomized Block Design: Experimental units are first grouped into blocks — groups of similar units — then randomly assigned to treatments within each block. Blocking on a variable that is related to the response reduces variability and makes it easier to detect treatment differences.
Matched Pairs Design: A special case of a block design with exactly two treatments. Each block contains two units that are matched on relevant characteristics (or the same individual receives both treatments in random order). Differences within pairs are used to measure the treatment effect.

Why Block?

Blocking removes a known source of variability from the error. If we know that GPA is related to test performance, we should block on GPA so that each block contains students with similar GPA. This ensures that differences between high-GPA and low-GPA students do not mask differences between treatments. Blocking increases statistical power — the ability to detect a real treatment effect.

Example 6.4 — CRD vs. Block Design

A researcher tests two study methods (Method A and Method B) on 40 students to see which produces higher exam scores.

Completely Randomized Design:

Number the 40 students 01–40.
Use a random process to assign 20 students to Method A and 20 to Method B.
Compare mean exam scores between the two groups.

Block Design (blocking on GPA: high / low):

Divide the 40 students into two blocks: 20 with high GPA and 20 with low GPA.
Within the high-GPA block, randomly assign 10 students to Method A and 10 to Method B.
Within the low-GPA block, randomly assign 10 students to Method A and 10 to Method B.
Compare mean exam scores within each block, then combine.

Why the block design is better here: GPA is related to exam performance. By blocking on GPA, we ensure that both study methods are tested on similar students within each block. The comparison is fairer and the variability due to GPA differences is removed, making it easier to detect a true difference between the two study methods.

TRY IT

In a matched pairs design, each subject receives BOTH treatments (or a before/after measurement is taken). What is the key advantage of this design?

Show Answer

The key advantage is that each subject serves as their own control. Because the same individual receives both treatments, all individual differences (ability, baseline health, motivation, etc.) are perfectly controlled. The only variability that remains is the within-person difference between the two treatments. This makes matched pairs designs very powerful for detecting treatment effects, especially when there is large variability between individuals. The analysis focuses on the difference within each pair, not the raw scores.

Randomized block design: Block 1 (high GPA) and Block 2 (low GPA), each split between Treatment A (green) and Treatment B (blue).

Figure 6.3 — Randomized Block Design: Two Blocks, Two Treatments

Practice Problems

A researcher surveys 200 of 2,000 employees by selecting every 10th name on an alphabetical list after a random start. What sampling method is this? Is it an SRS?

Show Solution

This is a systematic random sample. It is not an SRS because not every group of 200 employees has an equal chance of being chosen — only groups of every-10th employees from each possible starting point are possible samples. In a true SRS, any combination of 200 employees could be selected.

Identify the type of bias: A survey asks "Don't you agree that more homework hurts student well-being?" and 82% agree.

Show Solution

This is response bias due to question wording (a leading question). The phrase "Don't you agree" and the framing "hurts student well-being" both push respondents toward agreement. A neutral question — such as "What is your opinion on the amount of homework students receive?" — would eliminate this bias.

A study finds that people who own pets have lower blood pressure. Can we conclude that owning a pet lowers blood pressure? Explain.

Show Solution

No. This is an observational study, so we cannot conclude causation. There may be lurking variables: for example, people who are less stressed or more physically active may be more likely to own pets AND have lower blood pressure. The pet ownership and lower blood pressure could both be caused by a third variable (such as activity level or temperament) rather than one causing the other. Only a randomized experiment could establish causation.

An experiment tests three fertilizer types on corn yield using 30 plots. Each fertilizer is randomly assigned to 10 plots. Identify: experimental units, factor, levels, and response variable.

Show Solution

Experimental units: The 30 corn plots.
Factor (explanatory variable): Type of fertilizer.
Levels: Three levels — the three different fertilizer types.
Treatments: The three fertilizer types (one per group of 10 plots).
Response variable: Corn yield (e.g., bushels per plot).

In a drug trial, neither patients nor doctors know who received the drug vs. placebo. What is this called and why is it important?

Show Solution

This is a double-blind experiment. It is important for two reasons: (1) it prevents the placebo effect — patients who don't know if they have the drug cannot respond differently based on expectation alone; (2) it prevents researcher bias — doctors who don't know which group a patient is in cannot unconsciously treat them differently or record outcomes in a biased way. Double-blinding ensures the observed difference is due solely to the drug.

A school tests two teaching methods. They block by prior math achievement (above/below median), then randomly assign within blocks. Why is blocking helpful here?

Show Solution

Prior math achievement is strongly related to the response variable (future math performance). Without blocking, one method might randomly end up with more high-achieving students, making it appear more effective. By blocking on prior achievement, each teaching method is compared on students with similar prior ability within each block. This removes the variability due to prior achievement from the error, making it easier to detect a true difference in teaching method effectiveness.

AP FRQ: A company wants to study the effect of background music (classical, jazz, no music) on employee productivity. Design a completely randomized experiment with 90 employees. State the factor, treatments, and response variable, and explain how randomization would work.

Show Solution

Factor: Type of background music.
Treatments (levels): Three treatments — (1) classical music, (2) jazz music, (3) no music (silence/control).
Response variable: Employee productivity (e.g., units produced per hour or tasks completed per shift).
Randomization: Number the 90 employees 01–90. Use a random number generator to randomly assign 30 employees to each of the three treatment groups. All other conditions (work environment, shift length, task type) are held constant across groups.
Replication: Each treatment is applied to 30 employees, providing sufficient replication to reduce the effect of individual variation.

A voluntary response survey on a news website asks readers to rate the president's performance. Explain why this produces biased results and what type of bias is present.

Show Solution

This produces voluntary response bias. People who choose to participate in online polls tend to have stronger-than-average opinions (usually negative) about the subject. Those with moderate or neutral views are much less likely to click and respond. As a result, the sample overrepresents people with extreme opinions. Additionally, the sample only includes people who visited that particular news website, causing undercoverage bias — the website's audience may skew toward a particular political or demographic group that does not represent the broader population.

📋 Chapter Summary

Study Types

Observational Study

Researchers observe and record data without imposing treatments. Can show association but NOT causation due to potential confounding variables.

Experiment

Researchers impose treatments and randomly assign subjects. Can establish cause-and-effect relationships when properly designed.

SRS (Simple Random Sample)

Every individual and every group of size $n$ has an equal chance of selection. The gold standard for surveys.

Stratified Random Sample

Divide population into strata (homogeneous groups), then take SRS from each stratum. More precise than SRS alone.

Principles of Experimental Design

Control

Hold all lurking variables constant. Use a control group (placebo) to isolate the treatment effect.

Randomization

Randomly assign subjects to treatments to balance out confounding variables. Makes groups roughly equivalent before treatment.

Replication

Use enough subjects so that results are reliable and random variation is reduced.

Blocking

Group similar subjects into blocks before randomizing within blocks. Reduces variability from known confounders (like gender or age).

📘 Key Terms

Observational StudyData collected without imposing treatments. Shows association only — cannot establish causation.

ExperimentResearchers impose treatments and randomly assign subjects. Can establish causation.

Confounding VariableA variable associated with both the explanatory variable and the response, potentially distorting the apparent relationship.

RandomizationRandom assignment of subjects to treatments balances confounding variables and allows causal inference.

BiasSystematic error that makes a sample unrepresentative of the population. Cannot be corrected by increasing sample size.

Double-BlindNeither subjects nor those measuring outcomes know which treatment was received. Reduces placebo effect and measurement bias.

← Chapter 5: Regression Chapter 7: Probability →