Unit 6: Inference for Categorical Data: Proportions

Proportions, categorical data, and the logic of inference

A categorical variable places individuals into categories (yes/no, success/failure, supports/doesn’t support, defective/not defective). When there are exactly two categories, it’s common to label one category as a success and the other as a failure. These labels don’t mean “good” or “bad”, they’re just names.

A population proportion is the fraction of the entire population that falls into the “success” category. We typically denote it by p. Because you almost never observe an entire population, you collect data from a sample and compute the **sample proportion**, denoted \hat p (read “p-hat”).

Inference is the process of using sample data to draw conclusions about a population. In this unit, inferential questions often look like:

  • “What is the true proportion of voters who support a policy?” (estimate p with a confidence interval)
  • “Is the proportion of defective items more than 2%?” (test a claim about p)
  • “Is the proportion of success different between two groups or treatments?” (compare p_1 and p_2)

Why we can do inference with proportions

The key idea is sampling variability: even if the population proportion p is fixed, different random samples produce different \hat p values. If we understand the **sampling distribution** of \hat p (how \hat p behaves across many random samples), we can judge whether an observed \hat p is “typical” under a proposed value of p, and we can quantify uncertainty when estimating p.

For large enough samples, the sampling distribution of \hat p is approximately Normal (bell-shaped). This Normal approximation is what makes z procedures for proportions work.

Notation you must be fluent with

QuantityMeaningTypical notation
Population proportionTrue long-run proportion of successes in the populationp
Sample proportionProportion of successes in your sample\hat p = x/n
Number of successesCount of successes in the samplex
Sample sizeNumber of individuals in the samplen
Null valueClaimed population proportion in a testp_0

A common point of confusion: p is a fixed (but usually unknown) parameter. \hat p is a statistic that changes from sample to sample.

Exam Focus
  • Typical question patterns:
    • You are given n and x (or a sample proportion) and asked for a confidence interval for p.
    • You are given a claim like “at most 0.30” or “equal to 0.30” and asked to run a significance test about p.
    • You are asked to check conditions (random, 10% condition, large counts) and interpret results in context.
  • Common mistakes:
    • Treating \hat p like it is the true p (“The population is 0.52”) instead of recognizing sampling uncertainty.
    • Mixing up categorical (proportions) vs quantitative (means) inference.
    • Forgetting to connect the conclusion to the context and to the parameter (a population proportion, not a sample proportion).

Conditions for inference and the sampling distribution of a sample proportion

All inference procedures rely on assumptions. At a minimum, you must check for independence in how the data were collected and whether a Normal model for the sampling distribution is appropriate.

The two standard assumptions (and how we justify them)

1) Independence assumption. Individuals in a sample (or an experiment) must be independent of each other. This is ideally achieved through random sampling (or random assignment in an experiment). Independence across samples is obtained by selecting two (or more) separate random samples, or by randomizing individuals into separate treatment groups.

Because sampling is often done without replacement, sample size can also affect independence. If the sample is too large relative to the population, dependence becomes a concern. A common guideline is the 10% condition (10% Rule):

n \le 0.10N

where N is the population size.

2) Normality assumption (for proportions). Inference for proportions uses a Normal model for the sampling distribution of \hat p, even though the underlying count of successes is Binomial. The Binomial distribution is approximately Normal when the expected counts of successes and failures are large enough.

A helpful piece of notation is q = 1-p. The classic guideline is that both expected counts should be at least 10:

np \ge 10

nq \ge 10

(For hypothesis tests, those expected counts are computed using the null value; details appear below.)

Side note (still worth knowing): Inference for means is based on a Normal model for the sampling distribution of \bar x. That model is exactly correct if the population is Normal and is approximately correct for large samples by the CLT (a common rule of thumb is n \ge 30). This is often called the Normal/Large Sample condition for means.

What \hat p is mathematically

If you take a sample of size n and define success/failure, then the number of successes x follows a Binomial model when observations are independent and the probability of success is constant:

x \sim Binomial(n, p)

Then:

\hat p = x/n

So \hat p is just a rescaled Binomial count.

Center (expected value)

Across many random samples of size n from a population with proportion p, the average of \hat p is p:

E(\hat p) = p

This is why \hat p is an **unbiased estimator** of p in random sampling.

Spread (standard deviation)

The standard deviation of \hat p (the standard deviation of the sampling distribution of \hat p) is:

\sigma_{\hat p} = \sqrt{\frac{p(1-p)}{n}}

This formula shows two big ideas:

  1. If n increases, the variability of \hat p decreases like 1/\sqrt{n}.
  2. Variability is largest near p = 0.5 and smaller when p is near 0 or 1.

Standard error (estimated standard deviation)

A standard error is an estimate of the standard deviation of a sampling distribution.

For a one-proportion confidence interval, the standard error is typically estimated by substituting \hat p for p:

SE_{\hat p} = \sqrt{\frac{\hat p(1-\hat p)}{n}}

For a one-proportion test, the standard error is computed under the null hypothesis, using p_0.

When is \hat p approximately Normal?

In AP Statistics this is usually checked with a large counts condition.

  • For a confidence interval (where you don’t assume a specific p), use the sample counts:

n\hat p \ge 10

n(1-\hat p) \ge 10

  • For a significance test about p (where the null specifies p_0), check counts under the null:

np_0 \ge 10

n(1-p_0) \ge 10

This distinction matters because tests are evaluating what would happen if the null were true.

Example 6.1 (Normal model check for \hat p)

If we pick a simple random sample of size n = 80 from a large population, which of the following values of the population proportion p would allow use of the Normal model for the sampling distribution of \hat p?

  1. 0.10
  2. 0.15
  3. 0.90
  4. 0.95
  5. 0.99

Solution: The relevant condition is that both np \ge 10 and n(1-p) \ge 10.

  • For p = 0.10, np = 80(0.10) = 8, not at least 10.
  • For p = 0.90, n(1-p) = 80(0.10) = 8, not at least 10.
  • For p = 0.95, n(1-p) = 80(0.05) = 4, not at least 10.
  • For p = 0.99, n(1-p) = 80(0.01) = 0.8, not at least 10.
  • For p = 0.15, np = 80(0.15) = 12 and n(1-p) = 80(0.85) = 68, both at least 10.

So the only value that works is p = 0.15.

Exam Focus
  • Typical question patterns:
    • “Explain why it is reasonable to use a Normal approximation for \hat p.”
    • “Check conditions for inference.” (You must name and verify them.)
    • “How does changing n affect the margin of error?”
  • Common mistakes:
    • Using n\hat p and n(1-\hat p) for a **test** when you should use np_0 and n(1-p_0).
    • Forgetting the 10% condition when sampling without replacement from a known (or described) finite population.
    • Claiming Normality because “the sample size is large” without checking successes and failures.

One-sample z confidence interval for a population proportion

A confidence interval gives a range of plausible values for an unknown parameter, and here the parameter is a single population proportion p.

The meaning of a confidence interval (what the percentage really refers to)

The confidence level (like 90%, 95%, or 99%) is the long-run success rate of the method. If you repeatedly took random samples of size n and built a C\% confidence interval each time using the same procedure, about C\% of those intervals would capture the true population parameter (for proportions, p; in other settings, it could be \mu).

For any particular interval computed from a single sample, the parameter either is in the interval or it isn’t. After the data are collected, the probability that p is in this already-computed interval is effectively 1 or 0. The confidence percentage describes the method, not the one interval.

Why the interval has the form “estimate ± margin of error”

A z interval is built in the form:

estimate ± margin of error

The standard error measures how far the sample statistic typically varies from the population parameter. The margin of error is a multiple of the standard error, with that multiplier determined by how confident you wish to be in your procedure.

The one-proportion z interval (formula)

The standard one-sample z confidence interval for p is:

\hat p \pm z^*\sqrt{\frac{\hat p(1-\hat p)}{n}}

The margin of error is:

ME = z^*\sqrt{\frac{\hat p(1-\hat p)}{n}}

Conditions for a one-proportion z interval

You should justify these in words, in context:

  1. Random: the data come from a random sample or a randomized experiment.
  2. 10% condition (when sampling without replacement):

n \le 0.10N

  1. Large counts:

n\hat p \ge 10

n(1-\hat p) \ge 10

How to interpret the interval in context

A correct interpretation names the confidence level, the parameter p, and the population/context.

Template:

“We are C\% confident that the true proportion p of [population] who [success definition] is between [lower] and [upper].”

Avoid saying there is a C\% probability that p is in the interval.

Worked example: building and interpreting a confidence interval

Scenario: A random sample of 500 registered voters in a city is asked whether they support a proposed public transit expansion. Suppose 280 say “yes.”

1) Identify the parameter.

  • p = true proportion of all registered voters in the city who support the expansion.

2) Compute the sample proportion.

\hat p = \frac{280}{500} = 0.56

3) Check conditions.

  • Random: stated random sample.
  • 10%: if the city has far more than 500 voters, n \le 0.10N is reasonable.
  • Large counts:

n\hat p = 500(0.56) = 280 \ge 10

n(1-\hat p) = 500(0.44) = 220 \ge 10

4) Compute the interval (95%).

SE = \sqrt{\frac{0.56(0.44)}{500}} \approx 0.0222

Using the standard Normal critical value for 95% confidence (often taken as 1.96):

ME = 1.96(0.0222) \approx 0.0435

Confidence interval:

0.56 \pm 0.0435

So approximately:

  • Lower: 0.5165
  • Upper: 0.6035

5) Interpretation.

“We are 95% confident that the true proportion of all registered voters in the city who support the transit expansion is between about 0.52 and 0.60.”

Example 6.2 (99% confidence interval, plus evidence about a claim)

  1. If 42% of a simple random sample of 550 young adults say that whoever asks for the date should pay for the first date, determine a 99% confidence interval estimate for the true proportion of all young adults who would say this.
  2. Does this confidence interval give convincing evidence in support of the claim that fewer than 50% of young adults would say this?

Solution:

1) Parameter:

  • p represents the proportion of all young adults who would say that whoever asks for the date should pay for the first date.

Checks:

  • SRS is given.
  • 10% condition is reasonable since 550 is far less than 10% of all young adults.
  • Large counts using \hat p = 0.42:

n\hat p = 550(0.42) = 231 \ge 10

n(1-\hat p) = 550(0.58) = 319 \ge 10

Standard error:

SE = \sqrt{\frac{0.42(0.58)}{550}} \approx 0.021

For 99% confidence, the critical value is about 2.576, meaning about 99% of sample proportions should be within 2.576 standard deviations of the population proportion.

Interval:

0.42 \pm 2.576(0.021) = 0.42 \pm 0.054

So the 99% confidence interval is (0.366, 0.474). The margin of error is 0.054.

Conclusion in context: We are 99% confident that the true proportion of young adults who would say this is between 0.366 and 0.474.

2) Evidence about the claim: Yes. Because all values in the confidence interval (0.366 to 0.474) are less than 0.50, the interval gives convincing evidence in support of the claim that fewer than 50% of young adults would say this.

Common misunderstandings to avoid

  • Confidence is not probability about p after the fact; it’s about the method’s long-run success rate.
  • The interval is about a population proportion, not “42% of the sample” (that part is already known).
  • A narrower interval does not necessarily mean a “better” sample; it often just means a larger n.
Exam Focus
  • Typical question patterns:
    • “Calculate and interpret a 90% (or 95%, 99%) confidence interval for p.”
    • “How would the interval change if n were larger / confidence level were higher?”
    • “Check conditions and explain whether a z interval is appropriate.”
  • Common mistakes:
    • Using the wrong standard error (for intervals you use \hat p, not p_0).
    • Interpreting a 95% CI as “there is a 95% chance p is in this interval.”
    • Failing to define “success” and the population clearly in context.

One-sample z test for a population proportion

A significance test evaluates whether sample data provide convincing evidence against a specific claim about a parameter. In this unit, the claim is about a single proportion p.

The core logic of a hypothesis test

You start by assuming a particular value of p is true (the null hypothesis). Then you ask:

“If that null value were true, how likely is it that we would see a sample proportion at least as extreme as the one we observed?”

If that likelihood is very small, the data are inconsistent with the null hypothesis, and you reject it.

Hypotheses for a one-proportion test

The null is an equality statement about the population proportion:

H_0: p = p_0

The alternative is a strict inequality, chosen from the research question (not from the sample):

H_a: p \ne p_0

H_a: p > p_0

H_a: p < p_0

The test statistic (z)

For a one-proportion z test:

z = \frac{\hat p - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}

Important: the standard error in the denominator uses p_0, not \hat p, because you are modeling variability under the null hypothesis.

The p-value

The p-value is the probability, assuming H_0 is true, of getting a statistic at least as extreme as the one observed (in the direction of H_a).

  • For H_a: p > p_0, it’s the right-tail probability.
  • For H_a: p < p_0, it’s the left-tail probability.
  • For H_a: p \ne p_0, it’s the combined probability in both tails beyond |z|.

Significance level and decisions

A significance level \alpha (often 0.05) is a chosen cutoff for what counts as “small.” It is also called the \alpha-risk, because it controls the probability of a Type I error.

  • If p-value \le \alpha: reject H_0.
  • If p-value > \alpha: fail to reject H_0.

“Fail to reject” is not the same as “accept.”

Conditions for a one-proportion z test

  1. Random: random sample or randomized experiment.
  2. 10% condition if sampling without replacement.
  3. Large counts under the null:

np_0 \ge 10

n(1-p_0) \ge 10

Because the p-value is conditional on H_0 being true, you use p_0 when checking these conditions and when computing the standard deviation in the test statistic.

Type I and Type II errors, \beta, and power

Every hypothesis test risks being wrong.

  • Type I error: Reject H_0 when H_0 is actually true. Probability is controlled by \alpha.
  • Type II error: Fail to reject H_0 when H_a is actually true. The probability of a Type II error is often called \beta.

The power of a test against a particular alternative is:

1-\beta

Power is the probability of rejecting a false null hypothesis when that particular alternative is true. Ways to increase power include increasing sample size and increasing \alpha (though increasing \alpha also increases the chance of a Type I error). Also, a true parameter value farther from the hypothesized null value is easier to detect, which tends to increase power.

A key limitation: questions like “What is the power of this test?” or “What is the probability of a Type II error?” cannot be answered without specifying a particular alternative value for the population parameter.

Illustration (NASA launch): Suppose H_0 is that all systems are operating satisfactorily for a NASA launch.

  • Type I error: delay the launch by mistakenly thinking something is malfunctioning when everything is actually OK.
  • Type II error: fail to delay the launch by mistakenly thinking everything is OK when something is actually malfunctioning.

Power is the probability of recognizing a particular malfunction.

Worked example: a one-proportion z test (defective lightbulbs)

Scenario: A company claims that only 2% of its lightbulbs are defective. A quality-control inspector randomly tests 400 bulbs and finds 14 defective.

Let “success” = defective.

1) Parameter:

  • p = true proportion of all bulbs produced that are defective.

2) Hypotheses:

H_0: p = 0.02

H_a: p > 0.02

3) Compute \hat p:

\hat p = \frac{14}{400} = 0.035

4) Check conditions (large counts under the null):

np_0 = 400(0.02) = 8

This is not at least 10, so the Normal approximation is questionable. In an AP-style response, you should explicitly note that the large counts condition is not met, so a one-proportion z test may not be appropriate (an exact binomial method would be more suitable, though it is not the main focus here).

To still illustrate mechanics with adequate counts, suppose instead the inspector tested 1000 bulbs and found 35 defective (so \hat p = 0.035 still).

Now:

np_0 = 1000(0.02) = 20 \ge 10

n(1-p_0) = 1000(0.98) = 980 \ge 10

5) Test statistic:

z = \frac{0.035 - 0.02}{\sqrt{\frac{0.02(0.98)}{1000}}} \approx 3.39

6) p-value: For a right-tailed test, P(Z \ge 3.39) is very small.

7) Conclusion: At the 5% significance level, reject the company’s claim that the defect rate is 2%. There is convincing evidence that the true proportion of defective bulbs is greater than 0.02.

Example 6.3 (one-proportion z test with two possible \alpha levels and error consequences)

  1. A union spokesperson claims that 75% of union members will support a strike if their basic demands are not met. A company negotiator believes the true percentage is lower and runs a hypothesis test. What is the conclusion if 87 out of a simple random sample of 125 union members say they will strike?
  2. For each of the two possible answers (depending on significance level), what error might have been committed and what would be a possible consequence?

Solution:

1) Parameter: Let p represent the proportion of all union members who will support a strike.

Hypotheses:

H_0: p = 0.75

H_a: p < 0.75

Procedure: One-sample z-test for a population proportion.

Checks:

np_0 = 125(0.75) = 93.75 \ge 10

n(1-p_0) = 125(0.25) = 31.25 \ge 10

SRS is given, and we assume 125 is less than 10% of the total membership.

Mechanics: Calculator software (such as 1-PropZTest on the TI-84 or Z-1-PROP on the Casio Prizm) gives:

  • z = -1.394
  • p-value = 0.0816

Conclusion in context (showing the role of \alpha):

  • If \alpha = 0.05, then 0.0816 > 0.05, so fail to reject H_0. There is not sufficient evidence at the 5% significance level that the true percentage of union members who support a strike is less than 75%.
  • If \alpha = 0.10, then 0.0816 < 0.10, so reject H_0. There is sufficient evidence at the 10% significance level that the true percentage of union members who support a strike is less than 75%.

2) Possible errors and consequences:

  • If you fail to reject H_0 (treating the p-value as large at \alpha = 0.05), a Type II error is possible: the true proportion is actually less than 0.75, but you did not detect it. A possible consequence is that the union might call a strike thinking they have greater support than they actually do.
  • If you reject H_0 (treating the p-value as small at \alpha = 0.10), a Type I error is possible: the true proportion really is 0.75, but you rejected it. A possible consequence is that the union might not call for a strike thinking they don’t have sufficient support when they actually do.

How confidence intervals connect to tests

For a one-proportion problem, a two-sided hypothesis test at level \alpha is closely connected to a confidence interval with confidence level 1-\alpha.

  • If a value p_0 is outside a 95% confidence interval, then a two-sided test of H_0: p = p_0 at \alpha = 0.05 would reject.
  • If p_0 is inside the interval, the corresponding test would fail to reject.
Exam Focus
  • Typical question patterns:
    • “Perform a one-proportion z test and interpret the p-value.”
    • “State hypotheses (including correct direction) based on a claim.”
    • “Explain a Type I error and a Type II error in context.”
  • Common mistakes:
    • Using \hat p instead of p_0 in the standard error for the test statistic.
    • Choosing the alternative based on the sample result (direction-shopping).
    • Writing conclusions about the sample (“the sample shows…”) instead of the population proportion p.

Comparing two proportions: parameters, design, and sampling distribution

Many real questions are comparative: “Is the success rate different between two groups?” This requires inference about two population proportions.

When you have two proportions (common designs)

You typically see one of these designs:

  1. Two independent random samples from two populations.
  2. Randomized experiment with two treatments (random assignment creates independent groups).

In both cases, you end up with two sample proportions:

\hat p_1 = x_1/n_1

\hat p_2 = x_2/n_2

And two parameters:

  • p_1 = true proportion of successes in population (or treatment group) 1
  • p_2 = true proportion of successes in population (or treatment group) 2

The parameter of interest is usually the difference:

p_1 - p_2

Why “difference in sample proportions” makes sense

The statistic \hat p_1 - \hat p_2 estimates p_1 - p_2. If there is truly no difference in the populations, \hat p_1 - \hat p_2 should tend to be near 0 (with random variation).

Sampling distribution of \hat p_1 - \hat p_2 (center and spread)

When the two samples are independent and each sample is large enough for Normal approximation, the sampling distribution of \hat p_1 - \hat p_2 is approximately Normal.

  • Mean:

E(\hat p_1 - \hat p_2) = p_1 - p_2

  • Standard deviation:

\sigma_{\hat p_1 - \hat p_2} = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}

In practice, because p_1 and p_2 are unknown, we estimate variability using sample proportions when building a confidence interval.

Standard error for confidence intervals (unpooled)

For confidence intervals, the standard error is typically:

SE_{\hat p_1 - \hat p_2} = \sqrt{\frac{\hat p_1(1-\hat p_1)}{n_1} + \frac{\hat p_2(1-\hat p_2)}{n_2}}

Conditions for two-proportion inference

You should justify these for each group.

  1. Random: two random samples or random assignment to treatments.
  2. Independence:
    • Between groups: the two samples/groups are independent (no pairing).
    • Within groups: observations are independent (often via the 10% condition if sampling without replacement).

For sampling without replacement from two populations:

n_1 \le 0.10N_1

n_2 \le 0.10N_2

  1. Large counts: for an interval, use sample counts in each group:

n_1\hat p_1 \ge 10

n_1(1-\hat p_1) \ge 10

n_2\hat p_2 \ge 10

n_2(1-\hat p_2) \ge 10

In making calculations and drawing conclusions from specific samples, it is important both that the samples be simple random samples (or otherwise random) and that they be taken independently of each other.

Exam Focus
  • Typical question patterns:
    • Identify whether a scenario is two independent samples or an experiment and define p_1 and p_2 correctly.
    • Decide whether a two-proportion z procedure is appropriate by checking conditions for both groups.
    • Explain why results from a randomized experiment can support a cause-and-effect conclusion, but two-sample observational comparisons typically cannot.
  • Common mistakes:
    • Treating paired data (like before/after on the same people) as two independent samples. (Paired designs require different methods.)
    • Forgetting to define which group is “1” and which is “2,” then interpreting p_1 - p_2 backward.
    • Checking large counts with totals rather than separately for each group.

Two-sample z confidence interval for a difference in proportions

A two-proportion confidence interval estimates the difference p_1 - p_2.

The two-proportion z interval (formula)

(\hat p_1 - \hat p_2) \pm z^*\sqrt{\frac{\hat p_1(1-\hat p_1)}{n_1} + \frac{\hat p_2(1-\hat p_2)}{n_2}}

Interpreting the interval correctly

A correct interpretation must include the confidence level, the parameter p_1 - p_2, which populations/treatments are being compared, and the direction (which group minus which group).

Template:

“We are C\% confident that the true difference p_1 - p_2 between [group 1] and [group 2] is between [lower] and [upper].”

If the interval does not include 0, that suggests a real difference in proportions (at the matching confidence level).

Worked example: reminder systems (Email vs Text)

Scenario: Students are randomly assigned to either Email reminders (group 1) or Text reminders (group 2). At the end of the week:

  • Email: 72 of 120 students completed the form.
  • Text: 85 of 110 students completed the form.

1) Parameters:

  • p_1 = true proportion who would complete with Email.
  • p_2 = true proportion who would complete with Text.

2) Sample proportions:

\hat p_1 = \frac{72}{120} = 0.60

\hat p_2 = \frac{85}{110} \approx 0.7727

Difference:

\hat p_1 - \hat p_2 \approx -0.1727

3) Conditions: large counts are satisfied in each group.

4) 95% interval:

SE \approx 0.060

ME = 1.96(0.060) \approx 0.118

Interval:

-0.1727 \pm 0.118

Approximately:

  • Lower: -0.291
  • Upper: -0.055

5) Interpretation: We are 95% confident that the true difference (Email minus Text) is between about -0.29 and -0.06. Because the interval is entirely negative, Text reminders likely lead to a higher completion proportion than Email reminders. Since this was a randomized experiment, a cause-and-effect statement is reasonable for this setting.

Example 6.4 (90% confidence interval for nurse job satisfaction)

  1. Suppose that 84% of an SRS of 125 nurses working 7:00 a.m. to 3:00 p.m. shifts in city hospitals express positive job satisfaction, while only 72% of an independent SRS of 150 nurses on 11:00 p.m. to 7:00 a.m. shifts express similar fulfillment. Establish a 90% confidence interval estimate for the difference.
  2. Based on the interval, is there convincing evidence that the 7 a.m. to 3 p.m. shift has higher satisfaction?

Solution:

1) Parameters:

  • p_1 = proportion of all city-hospital nurses on 7:00 a.m. to 3:00 p.m. shifts with positive job satisfaction.
  • p_2 = proportion of all city-hospital nurses on 11:00 p.m. to 7:00 a.m. shifts with positive job satisfaction.

Procedure: Two-sample z-interval for p_1 - p_2.

Checks: Large counts are satisfied in each group using the sample proportions; samples are independent SRSs; and sample sizes are assumed less than 10% of the corresponding nurse populations.

Mechanics (matching common calculator output): 2-PropZInt on the TI-84 (or 2-Prop ZInterval on the Casio Prizm) gives:

  • Interval: (0.0391, 0.2009)

You can also see the structure:

  • Observed difference: 0.84 - 0.72 = 0.12
  • For 90% confidence, critical z-scores are about \pm 1.645
  • Standard error (reported): 0.0492
  • Margin of error: 1.645(0.0492) \approx 0.081
  • Interval estimate: 0.12 \pm 0.081

Conclusion in context: We are 90% confident that the true proportion of satisfied nurses on 7:00 a.m. to 3:00 p.m. shifts is between 0.039 and 0.201 higher than the true proportion for nurses on 11:00 p.m. to 7:00 a.m. shifts.

2) Evidence: Yes. Because the entire interval is positive, it gives convincing evidence that nurses on the 7 a.m. to 3 p.m. shift have a higher job satisfaction proportion.

A practical interpretation tip

When interpreting a difference in proportions, translate to percentage points. For example, an interval from -0.29 to -0.06 suggests group 1 is about 6 to 29 percentage points lower than group 2.

Exam Focus
  • Typical question patterns:
    • “Construct and interpret a confidence interval for p_1 - p_2.”
    • “Does the interval provide evidence of a difference? Explain using whether 0 is in the interval.”
    • “In an experiment, interpret the result as evidence of an effect of treatment.”
  • Common mistakes:
    • Interpreting a negative difference backward (forgetting the order group 1 minus group 2).
    • Using a pooled proportion in the standard error for a confidence interval (pooling is for tests, not standard intervals).
    • Making a cause-and-effect claim from two independent samples without random assignment.

Two-sample z test for a difference in proportions

A two-proportion significance test evaluates whether data provide convincing evidence that p_1 and p_2 differ (or that one is larger).

Hypotheses for comparing two proportions

Most commonly the null states equality (difference 0):

H_0: p_1 - p_2 = 0

Equivalent wording is H_0: p_1 = p_2.

Common alternatives are:

H_a: p_1 - p_2 \ne 0

H_a: p_1 - p_2 > 0

H_a: p_1 - p_2 < 0

Why pooling happens in the test

Under the null hypothesis H_0: p_1 = p_2, both groups share a common proportion. If the null is true, the best estimate of that common proportion uses data from both samples combined.

The pooled (combined) proportion is:

\hat p_c = \frac{x_1 + x_2}{n_1 + n_2}

Pooling is not a shortcut; it matches the model under the null hypothesis.

The two-proportion z test statistic

z = \frac{(\hat p_1 - \hat p_2) - 0}{\sqrt{\hat p_c(1-\hat p_c)\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}

Conditions for a two-proportion z test

  1. Random: two independent random samples or random assignment.
  2. Independence within and between groups (and 10% condition if sampling without replacement).
  3. Large counts under the null, which uses pooled counts:

n_1\hat p_c \ge 10

n_1(1-\hat p_c) \ge 10

n_2\hat p_c \ge 10

n_2(1-\hat p_c) \ge 10

Two points worth stressing:

  • Sample proportions from the same population can vary from each other.
  • What you are really comparing are plausible values (such as via confidence intervals), not just single sample points.

Worked example: reminder-system experiment (Email vs Text)

Using the same data:

  • Email (group 1): x_1 = 72, n_1 = 120
  • Text (group 2): x_2 = 85, n_2 = 110

If we define the parameter as p_1 - p_2 (Email minus Text), then “Text higher” corresponds to p_1 - p_2 < 0.

1) Hypotheses:

H_0: p_1 - p_2 = 0

H_a: p_1 - p_2 < 0

2) Sample proportions:

\hat p_1 = 0.60

\hat p_2 \approx 0.7727

3) Pooled proportion:

\hat p_c = \frac{72+85}{120+110} = \frac{157}{230} \approx 0.6826

4) Large counts under the null: satisfied using \hat p_c.

5) Compute z:

z \approx -2.81

6) p-value: For a left-tailed test, P(Z \le -2.81) is small.

7) Conclusion: At the 5% significance level, reject H_0. There is convincing evidence that the completion proportion is higher with Text reminders than with Email reminders.

Example 6.5 (two-proportion z test plus CI consistency)

  1. In a random sample of 1500 First Nations children in Canada, 162 were in child welfare care, while in an independent random sample of 1600 non-Aboriginal children, 23 were in child welfare care. Do the data give significant evidence that a greater proportion of First Nations children are in child welfare care?
  2. Does a 95% confidence interval for the difference in proportions give a result consistent with the hypothesis test conclusion?

Solution:

1) Parameters:

  • p_1 = proportion of all First Nations children in Canada in child welfare care.
  • p_2 = proportion of all non-Aboriginal children in Canada in child welfare care.

Hypotheses:

H_0: p_1 - p_2 = 0

H_a: p_1 - p_2 > 0

Procedure: Two-sample z-test for a difference of two population proportions.

Checks: Large counts under the null (using the pooled proportion) are satisfied; the samples are random and independent by design; and it is reasonable to assume both sample sizes are less than 10% of their populations.

Mechanics: Calculator software (such as 2-PropZTest) gives:

  • z = 11.0
  • p-value displayed as 0.000 (meaning extremely small, not literally zero)

Conclusion in context: With such a small p-value (less than 0.05), reject H_0. There is convincing evidence that the true proportion of First Nations children in child welfare care is greater than the true proportion of non-Aboriginal children in child welfare care.

2) CI consistency: Calculator software (such as 2-PropZInt) gives that we are 95% confident the true difference p_1 - p_2 is between 0.077 and 0.110. Since this interval is entirely positive, it is consistent with rejecting H_0 in favor of p_1 > p_2.

Linking test results to confidence intervals

If you built a 95% confidence interval for p_1 - p_2 and it did not include 0, you would reject a two-sided test at \alpha = 0.05. Be careful: a one-sided test does not match a two-sided interval in the same simple way.

Exam Focus
  • Typical question patterns:
    • “Carry out a two-proportion z test and interpret the p-value.”
    • “Explain why a pooled proportion is used in the test.”
    • “Write hypotheses and identify the correct tail based on wording (greater/less/different).”
  • Common mistakes:
    • Not pooling for the test (using the interval SE instead of the null SE).
    • Checking large counts using \hat p_1 and \hat p_2 instead of \hat p_c for the test.
    • Writing H_0: \hat p_1 = \hat p_2 (hypotheses must be about parameters p_1 and p_2).

Communicating inference well: conclusions, context, and practical meaning

On AP free-response questions, earning full credit is not just about computing numbers. You are assessed on whether you can tell the statistical story clearly and correctly.

The parameter is the main character

Every conclusion should be about the relevant parameter:

  • One sample: p
  • Two samples: p_1 - p_2

A strong conclusion explicitly mentions the population(s) or treatment groups and the definition of “success.”

Statistical significance vs practical significance

A result can be statistically significant (small p-value) but still not practically important.

  • With very large n, even tiny differences can produce small p-values because standard errors shrink.
  • Practical significance asks whether the effect size matters in the real world.

Confidence intervals help because they show a range of plausible effect sizes, not just whether an effect exists.

Association vs causation

How you got the data controls what you can claim:

  • Randomized experiment: rejecting H_0 supports a treatment effect (causation) for that experimental setting.
  • Observational study: a significant difference supports association, not causation, due to possible confounding.

Writing a complete inference response (AP-style structure)

A clear four-part organization often earns the most credit:

  • State: define parameters and state hypotheses or identify what interval you are constructing.
  • Plan: name the procedure and check conditions.
  • Do: show calculations (test statistic, p-value, interval).
  • Conclude: interpret in context, using p-value language or interval language.

Interpreting p-values with precision

A good p-value interpretation includes the null model:

“If H_0 were true (that p = p_0 or that p_1 = p_2), the probability of observing a sample result at least as extreme as the one we got is [p-value].”

Avoid saying:

  • “There is a [p-value] chance the null hypothesis is true.”
  • “The probability the alternative is true is …”

Choosing the correct procedure (one vs two proportion)

Ask yourself:

  • One group estimating/testing a single proportion: one-proportion z procedures.
  • Two independent groups comparing proportions: two-proportion z procedures.

A common trap: “before and after” on the same individuals is paired data, not two independent proportions.

Exam Focus
  • Typical question patterns:
    • “Interpret the confidence interval / p-value in context.”
    • “Explain whether a statistically significant result implies a meaningful effect.”
    • “Explain whether you can conclude causation and why.”
  • Common mistakes:
    • Writing conclusions without referencing the population(s) (“people” instead of “all registered voters in the city”).
    • Confusing “significant” with “important.”
    • Claiming causation from an observational comparison.

Deeper understanding and frequent pitfalls in proportion inference

This section ties ideas together and highlights places where reasoning (not arithmetic) often breaks.

Why conditions are more than a formality

The z procedures rely on an approximate Normal model. When large counts fail, the distribution of \hat p (or \hat p_1 - \hat p_2) can be skewed, and z-based p-values and intervals can be inaccurate.

A subtle point: even if n is “large,” if p is very small (rare events), you can still have too few expected successes. That’s why the check is about counts, not just n.

What changes the margin of error (and what doesn’t)

From the one-proportion margin of error:

ME = z^*\sqrt{\frac{\hat p(1-\hat p)}{n}}

  • Increasing confidence level increases z^*, so ME increases.
  • Increasing sample size increases n, so **ME decreases** (like 1/\sqrt{n}).
  • The value of \hat p(1-\hat p) is largest near 0.5.

What does not directly change ME: the population size N (unless the 10% condition fails badly, in which case assumptions are in trouble).

Two procedures, two standard errors: unpooled vs pooled

Why the SE differs between the two-proportion interval and the test:

  • Confidence interval: you do not assume p_1 = p_2, so you estimate variability separately in each group using \hat p_1 and \hat p_2.
  • Significance test for H_0: p_1 = p_2: under the null, both groups share a common p, so you estimate it with the pooled proportion \hat p_c.

Mixing these up can noticeably change results and cost reasoning points.

Edge-case wording that changes the alternative hypothesis

The words in the prompt matter:

  • “Higher than” implies H_a uses >.
  • “Different from” implies H_a uses \ne.
  • “Decreased” implies H_a uses

Be especially careful with “at least,” “at most,” “no more than,” and “no less than.” These often set the null boundary and determine the tail direction.

A note about plus-four intervals

Some courses teach the plus-four method for a one-proportion interval (adding two successes and two failures) to improve performance for small samples. AP Statistics primarily emphasizes the standard one-proportion z interval with the large counts condition. If large counts fail, the safest AP response is to state that conditions are not met and the z interval may not be appropriate.

Interpreting “not significant” correctly

Failing to reject H_0 does not prove H_0. It may mean:

  • the null is actually true, or
  • the effect exists but the sample was too small (low power), or
  • the data were too variable.

Confidence intervals help clarify this: a wide interval containing 0 (in two-proportion problems) suggests high uncertainty.

Rounding and calculator output

Good habits:

  • Carry extra decimals in intermediate steps (especially for standard errors and pooled proportions).
  • Round final answers reasonably (often 3 decimals for z, 4 for p-values, and 3 decimals for proportions, unless instructed otherwise).
  • If a calculator shows a p-value of 0.000, interpret it as “extremely small,” not literally zero.
  • Make sure your conclusion would not change due to rounding near the significance cutoff.
Exam Focus
  • Typical question patterns:
    • “Explain how increasing n would affect the p-value / margin of error.”
    • “The conditions are not met, what does that imply about the reliability of the result?”
    • “A calculator gives a p-value of 0.000, how should you interpret that?”
  • Common mistakes:
    • Treating “fail to reject” as “prove no difference.”
    • Using a confidence interval to justify a one-sided test conclusion without careful direction reasoning.
    • Believing that a tiny p-value implies a large or important effect.