AP Statistics: Inference for Quantitative Data (Means)

The Student’s t-Distribution

Why Not the Normal Distribution?

In previous units involving sample proportions, we used the Normal (z) distribution because we could assume the sampling distribution was normal. However, when working with quantitative data (means), we face a problem: to calculate the standardized score (z-score), we need the population standard deviation ( $\sigma$ ).

$z = \frac{\bar{x} - \mu}{\sigma/\sqrt{n}}$

In the real world, if we don't know the population mean ( $\mu$ ), we almost certainly do not know the population standard deviation ( $\sigma$ ). Therefore, we must estimate $\sigma$ using the sample standard deviation ( $s$ ). substituting $s$ for $\sigma$ introduces extra variability, meaning our statistic no longer follows a perfect Normal distribution. Instead, it follows the Student’s t-distribution.

Characteristics of the t-Distribution

The t-distribution was published in 1908 by W.S. Gosset (under the pseudonym "Student").

Shape: Like the Standard Normal, it is bell-shaped and symmetric centered at 0.
Spread: It has "fatter" tails and a lower peak than the Standard Normal. This accounts for the extra uncertainty introduced by estimating $\sigma$ with $s$ .
Degrees of Freedom (df): There is a different t-distribution for every sample size. We distinguish them by degrees of freedom.
- For a one-sample t-statistic: $df = n - 1$ .

Comparison of Normal Distribution and t-distributions with various degrees of freedom

Key Trend: As the sample size ( $n$ ) increases, the degrees of freedom increase, and the t-distribution approaches the Standard Normal distribution. The tails get thinner as our estimate $s$ becomes more reliable.

Inference for a Single Mean

1. Conditions for Inference

Before calculating a confidence interval or performing a test for a mean, you must verify three conditions. Use the mnemonic Rx3 (Random, 10%, Normal/Large).

Random: The data must come from a random sample or a randomized experiment. This prevents bias.
10% Condition: If sampling without replacement, the sample size ( $n$ ) must be less than 10% of the population size ( $N$ ). This allows us to treat observations as independent.
- Check: n < 0.10N
Normal/Large Sample: We need the sampling distribution of $\bar{x}$ to be approximately Normal.
- If population is Normal: The sample size doesn't matter.
- If population shape is unknown:
 - $n \ge 30$ : The Central Limit Theorem (CLT) guarantees the sampling distribution is approximately normal.
 - $n < 30$ : You must plot the sample data (dot plot or box plot). If there is no strong skewness and no outliers, the t-procedure is robust enough to use.

2. Standard Error of the Mean

When we estimate the standard deviation of the sampling distribution using $s$ , we call it the Standard Error (SE).

$SE_{\bar{x}} = \frac{s}{\sqrt{n}}$

3. One-Sample t-Interval (Confidence Interval)

A confidence interval estimates the true population mean $\mu$ .

$\text{Point Estimate} \pm (\text{Critical Value}) \cdot (\text{Standard Error})$

$\bar{x} \pm t^*_{df} \cdot \frac{s}{\sqrt{n}}$

$t^*$ depends on the confidence level (e.g., 95%) and the degrees of freedom ( $df = n-1$ ).

➥ Example 7.1: Gas Mileage

A random sample of 10 cars of a new model shows a mean gas mileage of 27.2 mpg with a standard deviation of 1.8 mpg. The population is approximately normally distributed. Construct a 95% confidence interval.

Solution: (PANIC Method)

P (Parameter): Let $\mu$ be the true mean gas mileage of all cars of this new model.
A (Assumptions):
- Random: Stated in problem.
- *10%: * Assume 10 cars < 10% of all cars.
- Normal: Population stated to be approximately normal.
N (Name): One-Sample t-Interval.
I (Interval):
- $df = 10 - 1 = 9$ . For 95% confidence, inverse-t gives $t^* \approx 2.262$ .
- $CI = 27.2 \pm 2.262(\frac{1.8}{\sqrt{10}}) \approx 27.2 \pm 1.29$
- Result: (25.91, 28.49)
C (Conclusion): We are 95% confident that the interval from 25.91 to 28.49 mpg captures the true mean gas mileage of this new car model.

4. One-Sample t-Test (Significance Test)

Used to test a claim about a population mean.

Null Hypothesis: $H0: \mu = \mu0$
Test Statistic:
$t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}$

➥ Example 7.2: AC Usage

A manufacturer claims their unit uses 6.5 kW/day. A consumer agency suspects it uses more. A random sample of 50 units has a mean of 7.0 kW and $s = 1.4$ kW.

Solution: (PHANTOMS Method)

P: $\mu$ = true mean electricity usage of the new AC units.
H: $H0: \mu = 6.5$ vs. Ha: \mu > 6.5
A: Random sample? Yes. 10% condition? Assume 50 < 10% of production. Normal/Large? $n=50 \ge 30$ , so CLT applies.
N: One-Sample t-Test.
T: $t = \frac{7.0 - 6.5}{1.4/\sqrt{50}} \approx 2.525$
O (Obtain P-value): Using $df=49$ , P(t > 2.525) \approx 0.0074.
M (Make Decision): Since 0.0074 < 0.05(\alpha), we reject $H_0$ .
S (State Conclusion): There is convincing evidence that the true mean electricity usage is greater than 6.5 kW.

Inference for Paired Data

Distinguishing Paired Data from Two Sample Data

Crucial Concept: Do not confuse Paired Data with Two Independent Samples.

Paired Data (Dependent): Data comes in couples. Example: Pre-test and Post-test scores for the same student; or twins; or right-arm vs left-arm. You are interested in the mean of the differences ( $\mu_{diff}$ ).
Two Independent Samples: The two groups have no relationship. Example: The mean height of men vs. the mean height of women.

The Paired t-Procedure

For paired data, we calculate the difference for each pair first ( $x{diff} = x2 - x_1$ ). Then, we perform a One-Sample t-test/interval on that single list of differences.

➥ Example 7.5: SAT Scores (Paired)

30 random students take an SAT prep class. We have their score Before and After. We want to know the mean improvement.

Common Mistake: Calculating the mean of the "Before" group and the mean of the "After" group and treating them as two samples. Wrong! These are the same students.

Correct Method:

Calculate $diff = \text{After} - \text{Before}$ for every student.
Find $\bar{x}{diff} = 42.25$ and $s{diff} = 27.92$ (given).
Procedure: Paired t-Interval (One-sample t-interval on differences).
Calc: $42.25 \pm t^*_{29} \cdot \frac{27.92}{\sqrt{30}}$
Result: (33.59, 50.91).
Conclusion: We are 90% confident the true mean improvement is between 33.59 and 50.91 points.

Logic Flowchart: Deciding between One-Sample, Paired, and Two-Sample tests

Inference for the Difference of Two Means

When we have two independent groups, we compare their population means ( $\mu1 - \mu2$ ).

1. Conditions

Random: Two independent random samples OR randomized assignment in an experiment.
10%: Applies to both populations separately (if observational).
Normal/Large: Both $n1 \ge 30$ and $n2 \ge 30$ , OR both populations are Normal, OR both sample graphs show no skew/outliers.

2. Formulas for Two Independent Samples

Standard Error:
$SE{\bar{x}1 - \bar{x}2} = \sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n_2}}$

Confidence Interval:
$(\bar{x}1 - \bar{x}2) \pm t^* \cdot \sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n2}}$

Two-Sample t-Statistic:
$t = \frac{(\bar{x}1 - \bar{x}2) - (\mu1 - \mu2){\text{hypothesized}}}{\sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n_2}}}$

3. Degrees of Freedom (The Messy Part)

There are two ways to calculate df for two samples:

Calculator Method (Satterthwaite approximation): A complex formula that usually yields a decimal. Use this for the most accurate results on the AP exam.
Conservative Method: $df = \min(n1 - 1, n2 - 1)$ . This is easier for hand calculations but yields a wider (less powerful) interval.

Note: Do NOT Pool. In AP Statistics, we generally do not pool variances ( $\sigma1 = \sigma2$ ) unless there is a very specific theoretical reason. Assume unpooled for calculations.

➥ Example 7.4: Computer Downtime

Company A: $n=40$ , $\bar{x}=125$ , $s=37$ .
Company B: $n=35$ , $\bar{x}=115$ , $s=43$ .
Tests claim: Company A has more downtime ( $\muA > \muB$ ).

Analysis:

$H0: \muA - \mu_B = 0$
Ha: \muA - \mu_B > 0
Using a calculator (2-SampTTest), we get $t \approx 1.07$ and $P\text{-value} \approx 0.14$ .
Conclusion: Since 0.14 > 0.05, we fail to reject $H_0$ . We do not have convincing evidence that Company A's downtime is higher.

Connecting Confidence Intervals and Test Conclusions

Confidence Intervals (CI) and Significance Tests are mathematically consistent.

Two-Sided Test ( $\alpha$ ): If the Null value lies outside the corresponding $1 - \alpha$ Confidence Interval, you generally Reject $H_0$ .
One-Sided Test: The connection is looser, but the logic holds. If the CI is entirely above the Null value, it supports a "GREATER THAN" hypothesis.

➥ Example 7.8: Consistency Check

Suppose we test $H0: \mu{diff} = 0$ vs $Ha: \mu{diff} \neq 0$ at $\alpha = 0.05$ .

If the 95% Confidence Interval is (-3.4, 23.4):
- Since 0 is inside the interval, it is a plausible value.
- We would Fail to Reject $H_0$ .
If the 95% Confidence Interval is (0.5, 19.5):
- Since 0 is NOT in the interval, 0 is not plausible.
- We would Reject $H_0$ .

Power, Type I, and Type II Errors

Though introduced earlier, these concepts apply to Means frequently in Free Response Questions.

Type I Error ( $\alpha$ ): Rejecting $H0$ when $H0$ is actually True. (False Positive).
Type II Error ( $\beta$ ): Failing to reject $H0$ when $H0$ is False. (False Negative).
Power ( $1 - \beta$ ): The probability of correctly rejecting a false null hypothesis.

Visualizing Type I Error, Type II Error, and Power in hypothesis testing

Key Relationships:

Increase $n$ (Sample Size): Power increases. (It's easier to detect a difference with more data).
Increase $\alpha$ (Significance Level): Power increases, but Type I error risk increases.
Effect Size: If the true mean is very far from the null mean, Power increases.

Simulation and P-Values

Sometimes, instead of a formula, we use simulation to estimate a P-value.

Logic:

Assume $H_0$ is true.
Simulate the sampling process 100 or 1000 times.
See how often a result as extreme as your observed sample occurs.
Estimated P-value = (Count of extreme simulations) / (Total simulations).

➥ Example 7.6 Review (MAD)

If a simulation of 100 trials shows that a MAD value of 0.06 or greater occurred only 3 times, the P-value is approx 0.03. Since 0.03 < 0.05, the result is statistically significant.

Common Mistakes & Pitfalls

Confusing Paired vs. Two-Sample:
- Tip: Look for the data source. Are there two separate groups of people (2-Sample)? Or is it one person measured twice (Paired)?
- Error: Using 2-Sample t-test on paired data reduces the Power significantly.
Using z instead of t:
- Rule: If you use $s$ (sample SD) to estimate $\sigma$ , you MUST use t.
- Correction: Only use z if the problem explicitly says "Population Standard Deviation is known" (rare).
Interpreting the CI Incorrectly:
- Wrong: "There is a 95% probability sample mean is in the interval."
- Wrong: "95% of data is in the interval."
- Correct: "We are 95% confident that the interval captures the true population mean."
Misidentifying Degrees of Freedom:
- Remember $df = n-1$ for one sample. Don't use $n$ .
- For two samples, write down the calculator's df value.
Not Checking Conditions:
- In FRQs, you get credit for identifying the procedure, but you lose substantial points if you don't explicitly check Random, 10%, and Normal conditions in context.