Unit 5 Notes: Understanding Sampling Distributions (Proportions and Means)

Sampling Distribution of a Sample Proportion

What a sampling distribution is (and why you care)

When you take a sample, you usually compute some statistic (a number calculated from the sample) like a sample proportion or a sample mean. If you took a different random sample of the same size from the same population, you would almost certainly get a different statistic.

A sampling distribution describes this idea formally: it is the distribution of a statistic over all possible random samples of a fixed size from the population. You almost never can list “all possible samples” in real life—but thinking this way is powerful because it lets you predict how much your statistic tends to vary.

This matters because inference (confidence intervals and significance tests) is built on one key question:

  • If the population parameter were really pp (or μ\mu), how likely is it that random sampling would produce the statistic you observed?

To answer that, you need the sampling distribution.

The statistic: sample proportion

The sample proportion is the fraction of sampled individuals who have a certain characteristic (“success”). Notation:

  • Parameter (fixed, population): pp = true population proportion
  • Statistic (random, sample): p^\hat{p} = sample proportion

If you take a sample of size nn and count successes XX, then:

p^=Xn\hat{p} = \frac{X}{n}

The value of p^\hat{p} changes from sample to sample, so p^\hat{p} has a sampling distribution.

How the sampling distribution of p^\hat{p} behaves

Under random sampling, the sampling distribution of p^\hat{p} has a predictable center and spread.

Center (mean)

The sampling distribution is centered at the true population proportion:

μp^=p\mu_{\hat{p}} = p

This is why p^\hat{p} is called an **unbiased estimator** of pp: over many random samples, it does not systematically overestimate or underestimate the true proportion.

Spread (standard deviation)

The standard deviation of the sampling distribution of p^\hat{p} is:

σp^=p(1p)n\sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}

Interpretation:

  • Larger nn makes σp^\sigma_{\hat{p}} smaller, so sample proportions cluster more tightly around pp.
  • Proportions near 0.50.5 have the largest variability because p(1p)p(1-p) is largest near 0.50.5.

In practice, you usually don’t know pp, so when you estimate the spread from data you use a related idea called the **standard error** (often using p^\hat{p} in place of pp). But for probability questions about the sampling distribution (when pp is given), you use σp^\sigma_{\hat{p}}.

Shape: when is p^\hat{p} approximately Normal?

The distribution of p^\hat{p} becomes approximately Normal when the sample is large enough that you expect at least 10 successes and 10 failures (using the population proportion pp):

  • np10np \ge 10
  • n(1p)10n(1-p) \ge 10

This is commonly called the Large Counts condition. When it holds, you can use Normal probability calculations with:

p^N(p,p(1p)n)\hat{p} \approx N\left(p, \sqrt{\frac{p(1-p)}{n}}\right)

This approximation is closely connected to the Central Limit Theorem (CLT), but AP Statistics often treats proportions with this specific “large counts” rule.

Conditions you must check (why they exist)

Sampling distribution results rely on how the sample was obtained.

  1. Random condition: The data should come from a random sample or randomized experiment. Without randomness, the probability model for the statistic is not trustworthy.

  2. Independence condition: Outcomes should be (approximately) independent.

    • When sampling without replacement from a finite population, independence is approximately true if the sample is not too large.
    • AP Statistics uses the 10% condition: n0.1Nn \le 0.1N, where NN is the population size.
  3. Large Counts condition (for Normal approximation): np10np \ge 10 and n(1p)10n(1-p) \ge 10.

A common misconception is thinking that “random” automatically means “independent.” Random sampling without replacement creates dependence when the sample is a big fraction of the population, which is exactly why the 10% condition matters.

Worked example: probability involving p^\hat{p}

A manufacturer knows that the true defect rate is p=0.08p = 0.08. A quality engineer randomly samples n=200n = 200 items. What is the probability the sample defect proportion is more than 0.120.12?

Step 1: Check conditions

  • Random: assume yes (random sample).
  • 10%: if the population is large enough that 2000.1N200 \le 0.1N, independence is reasonable.
  • Large Counts:
    • np=200(0.08)=16np = 200(0.08) = 16
    • n(1p)=200(0.92)=184n(1-p) = 200(0.92) = 184
      Both are at least 10, so Normal approximation is reasonable.

Step 2: Find mean and standard deviation

μp^=0.08\mu_{\hat{p}} = 0.08

σp^=0.08(0.92)200\sigma_{\hat{p}} = \sqrt{\frac{0.08(0.92)}{200}}

Compute:

σp^=0.0736200=0.0003680.0192\sigma_{\hat{p}} = \sqrt{\frac{0.0736}{200}} = \sqrt{0.000368} \approx 0.0192

Step 3: Convert to a z-score and use Normal probability

z=0.120.080.01922.08z = \frac{0.12 - 0.08}{0.0192} \approx 2.08

So:

P(p^>0.12)P(Z>2.08)P(\hat{p} > 0.12) \approx P(Z > 2.08)

Using standard Normal probabilities, this is about 0.0190.019.

What this means: Even with a true defect rate of 8%, you will occasionally see a sample as high as 12% just due to random sampling variability—but it’s fairly rare (about 2%).

Notation and meaning: parameter vs statistic (quick reference)
ConceptPopulation (parameter)Sample (statistic)Sampling distribution centerSampling distribution spread
Proportionppp^\hat{p}ppp(1p)n\sqrt{\frac{p(1-p)}{n}}
Exam Focus
  • Typical question patterns:
    • “Assuming pp is ___ and nn is ___, find P(p^>value)P(\hat{p} > \text{value})” (or between two values).
    • “Verify conditions for using a Normal model for p^\hat{p}, then compute a probability.”
    • “Explain how increasing nn changes the sampling distribution of p^\hat{p}.”
  • Common mistakes:
    • Using the Large Counts check with p^\hat{p} when the problem gives pp for a probability calculation (for this type of question, use pp).
    • Forgetting the 10% condition and treating sampling without replacement as independent when nn is a large fraction of NN.
    • Confusing the distribution of the data (0/1 outcomes) with the distribution of p^\hat{p} (a proportion that can take many values).

Sampling Distribution of a Sample Mean

What changes when you move from proportions to means

A proportion is built from yes/no outcomes. A mean is built from numerical measurements (heights, waiting times, scores). The central idea is the same: if you repeatedly take random samples of size nn and compute the sample mean each time, those means form a distribution.

  • Parameter: μ\mu = population mean
  • Statistic: xˉ\bar{x} = sample mean

If your sample is x1,x2,,xnx_1, x_2, \dots, x_n, then:

xˉ=x1+x2++xnn\bar{x} = \frac{x_1 + x_2 + \cdots + x_n}{n}

Because the sample changes, xˉ\bar{x} changes—so xˉ\bar{x} has a sampling distribution.

Center of the sampling distribution of xˉ\bar{x}

The sampling distribution is centered at the population mean:

μxˉ=μ\mu_{\bar{x}} = \mu

So xˉ\bar{x} is an **unbiased estimator** of μ\mu.

Spread: the standard deviation of xˉ\bar{x}

If the population standard deviation is σ\sigma, then:

σxˉ=σn\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}

This formula is one of the most important ideas in statistics: averaging reduces variability. Intuitively, individual observations bounce around a lot, but the average of many observations tends to be more stable.

You’ll often hear σxˉ\sigma_{\bar{x}} called the **standard deviation of the sampling distribution** or (in inference contexts) the **standard error** of the mean when σ\sigma is estimated.

Shape: when is xˉ\bar{x} Normal (or approximately Normal)?

The shape of the sampling distribution of xˉ\bar{x} depends on the population distribution and the sample size.

There are two main pathways to Normality:

  1. If the population is Normal, then xˉ\bar{x} is exactly Normal for any sample size nn.

  2. If the population is not Normal, then xˉ\bar{x} becomes approximately Normal when nn is large enough (this is the Central Limit Theorem, developed more fully in the next major section).

A key misconception is thinking the CLT makes the data Normal. It does not. The raw observations can remain skewed or weird; it’s the distribution of the sample mean that becomes approximately Normal as nn grows.

Conditions to use the sampling distribution results for xˉ\bar{x}

As with proportions, you need randomness and (approximate) independence.

  1. Random condition: data come from a random sample or randomized experiment.

  2. Independence / 10% condition: when sampling without replacement from a finite population of size NN, check:

n0.1Nn \le 0.1N

  1. Normality condition for using a Normal model for xˉ\bar{x}:
    • Population is Normal, or
    • Sample size is large enough for CLT to apply (often “large” is around 30 in many textbook settings, but on AP you justify based on context, skewness/outliers, and sample size rather than a single magic cutoff).
Worked example: probability involving xˉ\bar{x}

Suppose the amount of soda filled into bottles has population mean μ=500\mu = 500 mL and population standard deviation σ=4\sigma = 4 mL. A random sample of n=36n = 36 bottles is selected. What is the probability the sample mean fill is less than 498.5498.5 mL?

Step 1: Check conditions

  • Random: assume a random sample.
  • 10%: plausible if the day’s production is far more than 360 bottles.
  • Shape: even if the fill amounts are not perfectly Normal, n=36n = 36 is reasonably large; in many realistic manufacturing settings the distribution is roughly symmetric anyway.

Step 2: Compute the mean and standard deviation of xˉ\bar{x}

μxˉ=500\mu_{\bar{x}} = 500

σxˉ=436=46=0.6667\sigma_{\bar{x}} = \frac{4}{\sqrt{36}} = \frac{4}{6} = 0.6667

Step 3: Convert to z-score and compute probability

z=498.55000.6667=1.50.6667=2.25z = \frac{498.5 - 500}{0.6667} = \frac{-1.5}{0.6667} = -2.25

So:

P(xˉ<498.5)P(Z<2.25)P(\bar{x} < 498.5) \approx P(Z < -2.25)

This is about 0.0120.012.

Interpretation: Even though individual bottles vary with standard deviation 4 mL, the average of 36 bottles varies much less (standard deviation about 0.67 mL). So getting a sample mean as low as 498.5 mL is rare.

Connecting back to inference

When you build a confidence interval for μ\mu or run a test about μ\mu, you’re standardizing how far your sample mean is from the hypothesized mean using the typical sampling variability of xˉ\bar{x}. This is why the sampling distribution formulas show up everywhere in Unit 6 and beyond.

Exam Focus
  • Typical question patterns:
    • “Given μ\mu, σ\sigma, and nn, find P(xˉ>value)P(\bar{x} > \text{value}) or between two values.”
    • “Describe the sampling distribution of xˉ\bar{x}: shape, center, spread (with conditions).”
    • “How does changing nn affect the distribution of xˉ\bar{x}?”
  • Common mistakes:
    • Using σxˉ=σ/n\sigma_{\bar{x}} = \sigma/n instead of dividing by n\sqrt{n}.
    • Treating σ\sigma (population SD) as if it shrinks with larger samples; it’s the SD of xˉ\bar{x} that shrinks.
    • Ignoring strong skew/outliers with small nn and claiming Normality without justification.

Central Limit Theorem

What the CLT says in plain language

The Central Limit Theorem (CLT) is a foundational result that explains why Normal models show up so often.

It says: if you take many random samples of size nn from a population with mean μ\mu and standard deviation σ\sigma, then as nn becomes large, the sampling distribution of the sample mean xˉ\bar{x} becomes approximately Normal—no matter what the population distribution looks like (as long as it has a well-defined mean and standard deviation).

The CLT is not magic; it’s a statement about what happens when you average many independent pieces of randomness. Extreme values tend to get “smoothed out” by averaging.

Why the CLT matters

Without the CLT, you would often need to know the exact population distribution to do probability calculations or inference about means. The CLT lets you use Normal-based methods broadly because it provides a justification for why xˉ\bar{x} is approximately Normal in many real settings.

That “approximately” is crucial. The approximation can be excellent, decent, or poor depending on the situation.

How the CLT works conceptually (what improves the approximation)

Several factors affect how quickly the sampling distribution becomes close to Normal:

  • Sample size nn: bigger nn generally makes the sampling distribution more Normal.
  • Population shape:
    • If the population is already close to Normal, even small nn works well.
    • If the population is strongly skewed or has outliers, you generally need a larger nn.
  • Independence: the CLT assumes observations behave like independent draws. This is why random sampling and the 10% condition matter.

A helpful analogy: imagine adding up many small random “nudges.” No single nudge determines the final result; the total becomes more predictable in shape.

Formal statement for the mean

For sufficiently large nn:

xˉN(μ,σn)\bar{x} \approx N\left(\mu, \frac{\sigma}{\sqrt{n}}\right)

More explicitly:

μxˉ=μ\mu_{\bar{x}} = \mu

σxˉ=σn\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}

and the shape is approximately Normal when CLT conditions are met.

CLT and proportions: how they connect

A sample proportion p^\hat{p} is the mean of 0-1 indicator variables (1 for success, 0 for failure). Because of that, a CLT-type result applies to proportions too. In AP Statistics, this is operationalized through the Large Counts condition:

  • np10np \ge 10
  • n(1p)10n(1-p) \ge 10

When those expected counts are large enough, the sampling distribution of p^\hat{p} is approximately Normal.

So you can think of the “Normal approximation for p^\hat{p}” as a special case of the CLT applied to Bernoulli trials.

Worked example: using CLT when the population is skewed

Suppose the waiting time at a busy café has mean μ=6.8\mu = 6.8 minutes and standard deviation σ=4.5\sigma = 4.5 minutes. The distribution of individual waiting times is strongly right-skewed (a few very long waits).

A manager takes a random sample of n=50n = 50 customers and computes the mean waiting time. Approximate the probability that the sample mean exceeds 8 minutes.

Step 1: Justify CLT use

  • Random sample: assume yes.
  • Independence: assume the sample is less than 10% of all customers in the time period of interest.
  • Population is skewed, but n=50n = 50 is fairly large, so CLT gives a reasonable Normal approximation for xˉ\bar{x}.

Step 2: Describe sampling distribution of xˉ\bar{x}

μxˉ=6.8\mu_{\bar{x}} = 6.8

σxˉ=4.550\sigma_{\bar{x}} = \frac{4.5}{\sqrt{50}}

Compute:

σxˉ4.57.0710.636\sigma_{\bar{x}} \approx \frac{4.5}{7.071} \approx 0.636

Step 3: Compute z-score

z=86.80.6361.89z = \frac{8 - 6.8}{0.636} \approx 1.89

So:

P(xˉ>8)P(Z>1.89)P(\bar{x} > 8) \approx P(Z > 1.89)

This is about 0.0290.029.

Important interpretation: Individual waits are very skewed, but the average of 50 waits is much less skewed, and a Normal model for xˉ\bar{x} is often usable.

What goes wrong: common CLT misunderstandings
  1. “If nn is large, the data are Normal.”
    Wrong. The CLT is about the distribution of xˉ\bar{x} (or sums/averages), not the distribution of individual observations.

  2. n=30n = 30 always guarantees Normal.”
    Not guaranteed. It depends on the population shape. For extremely skewed data with outliers, you may need larger nn for a good approximation.

  3. Forgetting independence
    If data are dependent (for example, sampling a large fraction without replacement, or measuring repeated outcomes from the same individual), CLT conclusions can fail or require more advanced methods.

A brief note on sums (same idea, different scale)

Sometimes problems are phrased in terms of the sum of observations rather than the mean. If S=x1+x2++xnS = x_1 + x_2 + \cdots + x_n, then (under similar conditions):

μS=nμ\mu_S = n\mu

σS=σn\sigma_S = \sigma\sqrt{n}

The sum and the mean contain the same information (since xˉ=S/n\bar{x} = S/n), but they have different units and different spreads.

Exam Focus
  • Typical question patterns:
    • “Population is skewed with given μ\mu and σ\sigma. For sample size nn, approximate a probability about xˉ\bar{x} using the CLT.”
    • “Explain why the sampling distribution of xˉ\bar{x} is approximately Normal even though the population is not.”
    • “Decide whether a Normal approximation is reasonable and justify using conditions (random, 10%, and sample size vs skew/outliers).”
  • Common mistakes:
    • Claiming CLT applies without mentioning randomness/independence.
    • Using CLT language for proportions but forgetting the Large Counts check.
    • Mixing up the standard deviation of individuals σ\sigma with the standard deviation of the sample mean σxˉ\sigma_{\bar{x}}.