Chapter 6: Designing Studies
Learning Objectives
- Distinguish between population, sample, and census; identify the sampling frame
- Describe and compare probability sampling methods: SRS, stratified, cluster, and systematic
- Identify and explain sources of bias in surveys: undercoverage, non-response, and response bias
- Distinguish between observational studies and experiments
- Apply the four principles of experimental design: control, randomization, replication, and blocking
- Describe completely randomized designs and randomized block designs including matched pairs
6.1 Population, Sample, and Sampling Methods
When we want to learn about a group, we rarely have the resources to study every individual. Instead, we select a sample from the larger population and use what we learn from the sample to draw conclusions about the population.
Definition: Key Terms
- Population: The entire group of individuals we want information about.
- Sample: A subset of individuals selected from the population to represent it.
- Census: An attempt to collect data from every individual in the population.
- Sampling frame: The list of individuals from which the sample is actually drawn. A poor sampling frame causes undercoverage bias.
- Simple Random Sample (SRS): A sample selected so that every group of $n$ individuals from the population has an equal chance of being chosen. Each individual also has an equal chance of selection.
Probability Sampling Methods
A probability sample uses a chance mechanism to select individuals, so every member of the population has a known probability of being chosen. The most important probability samples are:
- Simple Random Sample (SRS): Assign each individual a number; use a random number generator or table to select $n$ individuals. Every set of $n$ individuals is equally likely to be chosen.
- Stratified Random Sample: Divide the population into non-overlapping groups called strata (based on a shared characteristic such as grade level or gender), then take an SRS from each stratum. Ensures representation from each subgroup.
- Cluster Sample: Divide the population into groups called clusters (often geographic or naturally occurring), randomly select some clusters, then survey all individuals in the selected clusters. More practical when the population is spread over a large area.
- Systematic Sample: Randomly choose a starting point, then select every $k$th individual from a list. For example, if you want a sample of 50 from 500, you select every 10th name after a random start.
Non-Probability Sampling (Avoid These)
Voluntary response samples allow individuals to choose whether to participate (e.g., online polls, call-in surveys). These are biased because people with strong opinions — often negative — are more likely to respond, so the sample does not represent the population.
Convenience samples select individuals who are easy to reach (e.g., surveying people in the hallway). These are biased because accessible individuals may differ systematically from the broader population.
Example 6.1 — Comparing SRS and Voluntary Response
A school wants to survey 50 of its 500 students about cafeteria food quality.
SRS approach: Assign each student a number from 001 to 500. Use a random number table or calculator to select 50 distinct numbers. Every student has an equal $\frac{50}{500} = 10\%$ chance of selection, and every group of 50 has an equal chance of being chosen.
Voluntary response approach: Post a sign-up sheet in the cafeteria. Students who feel strongly (likely those who dislike the food) are more likely to sign up. The sample will overrepresent dissatisfied students and underrepresent satisfied or neutral students — producing a biased, unrepresentative result.
Conclusion: The SRS produces an unbiased sample; the voluntary response sample is likely biased toward negative opinions.
A city wants to know residents' opinion on building a new park. The city has five distinct neighborhoods of roughly equal size. Describe how to take a stratified random sample by neighborhood.
Show Answer
Step 2: Determine sample size per stratum. If you want 100 total respondents, plan to select 20 from each neighborhood.
Step 3: Within each neighborhood, obtain a list of residents (the sampling frame for that stratum) and use an SRS to select 20 residents.
Advantage: This guarantees proportional representation from all five neighborhoods, which a single SRS from the whole city might miss by chance.
Population (gray) vs. sample (green) — the highlighted points represent the 50 selected from a population of 500.
Figure 6.1 — Population and Sample Visualization
6.2 Sources of Bias in Surveys
A biased sample or survey design systematically favors certain outcomes over others. Bias means the results will consistently deviate from the truth in a predictable direction. Increasing sample size does not fix bias — it only makes the biased result more precisely wrong.
Types of Bias in Surveys
- Undercoverage bias: Some groups of the population have a lower probability of being included in the sample than others. Example: a telephone survey that only calls landlines will underrepresent younger adults who use only cell phones.
- Non-response bias: Individuals who are selected for the sample cannot be contacted or choose not to participate. If non-respondents differ systematically from respondents, the results are biased.
- Response bias (voluntary response or wording bias): Respondents give inaccurate or dishonest answers. Causes include: social desirability (answering in a way that seems "acceptable"), leading question wording, question order effects, or the presence of an interviewer.
Example 6.2 — Identifying Bias in a Survey Question
An online poll asks: "Do you agree that our city is failing its children by not funding playgrounds?" The results show 79% agree.
Bias identified: This is a classic example of response bias due to question wording (a leading question). The phrase "failing its children" is emotionally charged and pushes respondents toward agreement. A neutral question such as "Should the city increase funding for playgrounds?" would produce very different results.
Additionally, since this is an online poll with self-selection, it also suffers from voluntary response bias — people with strong feelings (those who strongly agree or disagree) are more likely to participate than those with moderate views.
A survey is mailed to 1,000 randomly selected households; only 120 respond. What type of bias is most concerning, and why?
Show Answer
Survey response visualization: 1,000 households were selected, but only 120 responded — a 12% response rate.
Figure 6.2 — Non-Response Rate: 1,000 Selected, 120 Responded
6.3 Principles of Experimental Design
An observational study observes individuals and measures variables without attempting to influence the responses. We can find associations, but we cannot establish causation. An experiment deliberately imposes a treatment on individuals in order to observe their responses — experiments can establish causation when properly designed.
Definition: Experiment vs. Observational Study
- Observational study: The researcher observes and records data without intervening. Can reveal associations but cannot prove causation (lurking variables may be responsible).
- Experiment: The researcher imposes one or more treatments on experimental units and measures the response. When randomization is used, experiments can establish cause and effect.
- Explanatory variable (factor): The variable whose effect on the response variable is being studied. Different values of the factor are called levels.
- Response variable: The outcome that is measured after applying the treatment.
- Treatment: A specific condition applied to experimental units (a combination of factor levels).
- Experimental units: The individuals (people, animals, plots, etc.) to which treatments are applied.
The Four Principles of Experimental Design
- Control: Keep all variables that might affect the response constant across treatment groups — except for the treatment itself. This includes using a control group (a group that receives no treatment or a standard/placebo treatment) for comparison.
- Randomization: Randomly assign experimental units to treatment groups. Randomization balances out the effects of lurking variables (known and unknown) across groups, making the groups roughly equivalent at the start of the experiment.
- Replication: Apply each treatment to enough experimental units to reduce the effect of chance variation. More replication produces more reliable estimates of treatment effects.
- Blocking: Group similar experimental units into blocks before randomizing. Within each block, randomly assign treatments. This reduces variability and increases the ability to detect real treatment differences.
Placebo and Double-Blind Experiments
A placebo is a fake treatment (such as a sugar pill) that looks identical to the real treatment. It is used to account for the placebo effect — the tendency for people to respond positively simply because they believe they are being treated.
In a blind experiment, subjects do not know which treatment they received. In a double-blind experiment, neither the subjects nor the researchers who interact with them know which treatment was assigned. Double-blind experiments prevent both the placebo effect and researcher bias from influencing the results.
Example 6.3 — Designing an Experiment
Design an experiment to test whether listening to music improves math test scores.
- Factor (explanatory variable): Listening condition — music vs. no music (silence)
- Levels: Two levels — (1) classical music, (2) silence (control)
- Treatments: Music group listens to classical music during the test; control group completes the test in silence
- Experimental units: Students in the study
- Response variable: Math test score
- Control group: Students who take the test in silence
- Randomization: Randomly assign students to the music or silence condition
- Replication: Use a large enough sample (e.g., at least 30 per group) so that random differences in student ability average out
What is the purpose of a placebo in a medical experiment?
Show Answer
AP Exam Tip — Confounding vs. Lurking Variables: A lurking variable is associated with both the explanatory and response variables but is not part of the study (common in observational studies). A confounding variable is a variable in an experiment whose effect on the response cannot be separated from the effect of the explanatory variable. Randomization in experiments controls for both — it is the key reason why well-designed experiments can establish causation while observational studies cannot.
6.4 Completely Randomized and Block Designs
There are two major experimental designs tested on the AP exam: the completely randomized design and the randomized block design.
Definition: Experimental Design Types
- Completely Randomized Design (CRD): All experimental units are randomly assigned to treatments with no prior grouping. The simplest experimental design. Works best when subjects are relatively homogeneous (similar to each other).
- Randomized Block Design: Experimental units are first grouped into blocks — groups of similar units — then randomly assigned to treatments within each block. Blocking on a variable that is related to the response reduces variability and makes it easier to detect treatment differences.
- Matched Pairs Design: A special case of a block design with exactly two treatments. Each block contains two units that are matched on relevant characteristics (or the same individual receives both treatments in random order). Differences within pairs are used to measure the treatment effect.
Why Block?
Blocking removes a known source of variability from the error. If we know that GPA is related to test performance, we should block on GPA so that each block contains students with similar GPA. This ensures that differences between high-GPA and low-GPA students do not mask differences between treatments. Blocking increases statistical power — the ability to detect a real treatment effect.
Example 6.4 — CRD vs. Block Design
A researcher tests two study methods (Method A and Method B) on 40 students to see which produces higher exam scores.
Completely Randomized Design:
- Number the 40 students 01–40.
- Use a random process to assign 20 students to Method A and 20 to Method B.
- Compare mean exam scores between the two groups.
Block Design (blocking on GPA: high / low):
- Divide the 40 students into two blocks: 20 with high GPA and 20 with low GPA.
- Within the high-GPA block, randomly assign 10 students to Method A and 10 to Method B.
- Within the low-GPA block, randomly assign 10 students to Method A and 10 to Method B.
- Compare mean exam scores within each block, then combine.
Why the block design is better here: GPA is related to exam performance. By blocking on GPA, we ensure that both study methods are tested on similar students within each block. The comparison is fairer and the variability due to GPA differences is removed, making it easier to detect a true difference between the two study methods.
In a matched pairs design, each subject receives BOTH treatments (or a before/after measurement is taken). What is the key advantage of this design?
Show Answer
Randomized block design: Block 1 (high GPA) and Block 2 (low GPA), each split between Treatment A (green) and Treatment B (blue).
Figure 6.3 — Randomized Block Design: Two Blocks, Two Treatments
Practice Problems
A researcher surveys 200 of 2,000 employees by selecting every 10th name on an alphabetical list after a random start. What sampling method is this? Is it an SRS?
Show Solution
Identify the type of bias: A survey asks "Don't you agree that more homework hurts student well-being?" and 82% agree.
Show Solution
A study finds that people who own pets have lower blood pressure. Can we conclude that owning a pet lowers blood pressure? Explain.
Show Solution
An experiment tests three fertilizer types on corn yield using 30 plots. Each fertilizer is randomly assigned to 10 plots. Identify: experimental units, factor, levels, and response variable.
Show Solution
Factor (explanatory variable): Type of fertilizer.
Levels: Three levels — the three different fertilizer types.
Treatments: The three fertilizer types (one per group of 10 plots).
Response variable: Corn yield (e.g., bushels per plot).
In a drug trial, neither patients nor doctors know who received the drug vs. placebo. What is this called and why is it important?
Show Solution
A school tests two teaching methods. They block by prior math achievement (above/below median), then randomly assign within blocks. Why is blocking helpful here?
Show Solution
AP FRQ: A company wants to study the effect of background music (classical, jazz, no music) on employee productivity. Design a completely randomized experiment with 90 employees. State the factor, treatments, and response variable, and explain how randomization would work.
Show Solution
Treatments (levels): Three treatments — (1) classical music, (2) jazz music, (3) no music (silence/control).
Response variable: Employee productivity (e.g., units produced per hour or tasks completed per shift).
Randomization: Number the 90 employees 01–90. Use a random number generator to randomly assign 30 employees to each of the three treatment groups. All other conditions (work environment, shift length, task type) are held constant across groups.
Replication: Each treatment is applied to 30 employees, providing sufficient replication to reduce the effect of individual variation.
A voluntary response survey on a news website asks readers to rate the president's performance. Explain why this produces biased results and what type of bias is present.
Show Solution
📋 Chapter Summary
Study Types
Researchers observe and record data without imposing treatments. Can show association but NOT causation due to potential confounding variables.
Researchers impose treatments and randomly assign subjects. Can establish cause-and-effect relationships when properly designed.
Every individual and every group of size $n$ has an equal chance of selection. The gold standard for surveys.
Divide population into strata (homogeneous groups), then take SRS from each stratum. More precise than SRS alone.
Principles of Experimental Design
Hold all lurking variables constant. Use a control group (placebo) to isolate the treatment effect.
Randomly assign subjects to treatments to balance out confounding variables. Makes groups roughly equivalent before treatment.
Use enough subjects so that results are reliable and random variation is reduced.
Group similar subjects into blocks before randomizing within blocks. Reduces variability from known confounders (like gender or age).