Unit 8: Inference for Categorical Data: Chi-Square

Categorical Data and the Logic of Chi-Square Inference

What “categorical inference” is (and why it looks different from means)

When you work with categorical variables, each observation falls into a label or group (for example: political party, brand chosen, color, yes/no). Unlike quantitative data, you do not compute averages of categories. Instead, you summarize categorical data with counts and proportions in each category.

In this unit, the goal is to use sample data to decide whether an observed categorical pattern is plausibly explained by chance variation, or whether it provides evidence against a proposed model. A perfect fit cannot be expected even if a model is correct, so the focus is on discrepancies: are the differences between what you observed and what you would expect small enough to attribute to random variation, or large enough to be statistically significant?

The “expected vs observed” mindset

All chi-square tests start with a null hypothesis that acts like a long-run “blueprint” for how counts should fall into categories.

In a goodness-of-fit setting, the claim specifies the long-run proportions for a single categorical variable.
In a homogeneity setting, the claim says multiple populations (or treatments) have the same distribution of a categorical variable.
In an independence setting, the claim says two categorical variables are not associated in a single population.

In every case, you compute expected counts under the null hypothesis, compare them to observed counts, and aggregate the discrepancies.

What chi-square tests can and cannot tell you

Chi-square tests can tell you whether the data provide evidence that:

a proposed categorical distribution does not fit (goodness-of-fit)
group distributions differ (homogeneity)
two categorical variables are associated (independence)

They do not automatically tell you:

which specific categories (or cells) drove the difference (you need follow-up analysis)
whether the result is important in practice (you need context and effect-size thinking)
causation, unless the data come from a randomized experiment with random assignment

A common misconception is “a significant chi-square means two variables are strongly related.” Significance only says the pattern is unlikely under the null. With very large samples, even tiny, unimportant differences can be significant.

Exam Focus

Typical question patterns:
- Identify the correct chi-square procedure from a scenario (one-way vs two-way table; independence vs homogeneity).
- Interpret a description like “compare observed to expected counts” and match it to the right test.
- Explain why you cannot use a chi-square test on percentages alone (you need counts).
Common mistakes:
- Treating categorical inference like a mean/proportion z or t test.
- Claiming “proves” instead of “provides evidence,” or claiming causation without random assignment.
- Ignoring that chi-square methods are based on counts and expected counts, not raw percentages.

The Chi-Square Distribution, Test Statistic, and Degrees of Freedom

What the chi-square distribution represents

A chi-square distribution is a family of right-skewed distributions used to model sums of squared standardized deviations. In categorical inference, if the null hypothesis is true and conditions are met, the chi-square test statistic follows (approximately) a chi-square distribution.

Key shape facts that often show up in explanations:

A chi-square distribution has only nonnegative values.
It is not symmetric; it is always skewed to the right.
Different chi-square distributions have different shapes depending on the degrees of freedom (df).
As df increases, the distribution becomes less skewed and looks closer to a normal distribution.

If you are using a critical-value approach (rather than only a p-value), you need this distribution to decide how large a computed χ² statistic must be to be “significant.”

The chi-square test statistic: how it measures mismatch

The sum of these weighted discrepancies is called the chi-square statistic. For a one-way table or a two-way table:

$\chi^2 = \sum \frac{(O - E)^2}{E}$

Where:

$O$ is an observed count in a category (or cell)
$E$ is the expected count for that category (or cell) assuming the null is true
the sum is over all categories/cells

Why this specific structure works: $O-E$ measures the deviation, squaring prevents cancellation and penalizes large deviations, and dividing by $E$ scales the deviation relative to how big you expected the count to be.

Important interpretations:

$\chi^2$ is always nonnegative.
The smaller the $\chi^2$ value, the better the fit to the null model.
Large $\chi^2$ values indicate the observed counts are far from expected counts relative to what the null predicts.

Degrees of freedom: how many “independent deviations” are possible

Degrees of freedom describe how many counts can vary freely once the null model and totals are fixed.

Goodness-of-fit (one-way table with $k$ categories):

$df = k - 1$

Two-way table with $r$ rows and $c$ columns:

$df = (r - 1)(c - 1)$

P-values with chi-square: why tests are right-tailed

The p-value is the probability of obtaining a $\chi^2$ value as extreme as (or more extreme than) the one computed, assuming the null hypothesis is true. “Extreme” for chi-square means “large,” so chi-square tests are right-tailed.

A very small $\chi^2$ just means the data match the null unusually well; that is not evidence against the null.

Chi-square approximation: why expected counts matter

The chi-square distribution is an approximation that improves as expected counts get larger. When expected counts are tiny, the true sampling distribution of $\chi^2$ under the null is not well-approximated by a chi-square curve, making p-values unreliable. This is why every chi-square procedure emphasizes checking expected counts.

Exam Focus

Typical question patterns:
- Compute $df$ for a given table, then interpret what it means.
- Given a test statistic and $df$ , find or interpret a p-value (from output or a chi-square table).
- Explain why chi-square tests are right-tailed.
Common mistakes:
- Using $df = k$ instead of $k - 1$ for goodness-of-fit.
- Using $df = rc - 1$ instead of $(r - 1)(c - 1)$ for two-way tables.
- Forgetting that the p-value is the right-tail probability.

Chi-Square Goodness-of-Fit Test (One-Way Table)

What a goodness-of-fit test asks

A chi-square goodness-of-fit test checks whether a single categorical variable in a population follows a claimed distribution. You use it when:

there is one categorical variable
there is a claimed set of proportions for the categories
you collect a sample and count how many observations fall in each category

This matches the idea that the null hypothesis represents a “good fit” to a theoretical or claimed distribution, while the alternative says the fit is poor (at least one proportion differs).

Hypotheses: how to write them clearly

In words:

Null hypothesis: the population distribution matches the claimed proportions.
Alternative hypothesis: the population distribution does not match the claimed proportions.

If there are $k$ categories with claimed proportions $p_1, p_2, \dots, p_k$ , you can write:

$H_0: p_1=\text{claimed}_1,\ p_2=\text{claimed}_2,\ \dots,\ p_k=\text{claimed}_k$

$H_a: \text{At least one } p_i \text{ differs from its claim}$

AP scoring typically rewards hypotheses written in context (naming the variable and population), not just symbols.

Expected counts in goodness-of-fit

If the null is true and the total sample size is $n$ , then the expected count in category $i$ is:

$E_i = n p_i$

Conditions (what must be true to trust the p-value)

You should justify:

Random: data come from a random sample (or a randomized experiment with a single categorical response).
Independence: observations are independent (often supported by random sampling and the 10% condition when sampling without replacement).
Large expected counts: all expected counts are sufficiently large for the chi-square approximation (a common AP rule of thumb is at least 5 in each category).

If some expected counts are too small, you can sometimes combine categories (pooling) if it makes sense, and then recompute hypotheses (as needed) and $df$ .

Carrying out the test: the full process

Compute expected counts $E_i = n p_i$ .
Compute the test statistic:

$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$

Use $df = k - 1$ .
Find the p-value as the right-tail area under the chi-square distribution with that df.
Conclude in context by linking your decision to the p-value.

Worked example: candy color distribution

A company claims a bag of candy has these color proportions:

Red: 0.20
Blue: 0.25
Green: 0.15
Yellow: 0.20
Orange: 0.20

A student randomly selects a bag and counts $n = 200$ candies, getting:

Red 50
Blue 40
Green 20
Yellow 45
Orange 45

Step 1: State hypotheses

$H_0: \text{The true color distribution matches } (0.20, 0.25, 0.15, 0.20, 0.20)$

$H_a: \text{The true color distribution does not match the claim}$

Step 2: Expected counts

$E_{red} = 200(0.20) = 40$

$E_{blue} = 200(0.25) = 50$

$E_{green} = 200(0.15) = 30$

$E_{yellow} = 200(0.20) = 40$

$E_{orange} = 200(0.20) = 40$

All expected counts are at least 5.

Step 3: Compute $\chi^2$

Red contribution: 2.5
Blue contribution: 2.0
Green contribution: 3.333
Yellow contribution: 0.625
Orange contribution: 0.625

$\chi^2 \approx 9.083$

Step 4: Degrees of freedom

$df = 5 - 1 = 4$

Step 5: P-value and conclusion
Use technology or a chi-square table to find the right-tail probability for $\chi^2 = 9.083$ with $df = 4$ . If the p-value is below your significance level (like 0.05), reject $H_0$ and conclude the color distribution differs from the company claim.

Interpretation tip: this test does not, by itself, tell you exactly which color(s) are “wrong,” but you can see which categories contribute most to $\chi^2$ (here, green and red are large contributors).

Example 8.1: liquor stores across city regions

A large city is divided into four distinct socioeconomic regions. The area percentages of the regions are 12%, 38%, 32%, and 18%. In a random sample of 55 liquor stores, the numbers from each region are 4, 16, 26, and 9.

1) Expected counts if liquor stores followed the same proportions as region areas

$E_1 = 55(0.12) = 6.6$

$E_2 = 55(0.38) = 20.9$

$E_3 = 55(0.32) = 17.6$

$E_4 = 55(0.18) = 9.9$

2) Test whether the observed counts differ significantly from the expected counts

Hypotheses (in context):

$H_0: \text{Liquor stores are distributed across the four regions in proportions } (0.12,0.38,0.32,0.18)$

$H_a: \text{Liquor stores are not distributed across the four regions in those proportions}$

Procedure: chi-square goodness-of-fit test.
Checks:
- Randomization: the sample is stated to be random.
- Large expected counts: 6.6, 20.9, 17.6, and 9.9 are all greater than 5.
- Independence (10% condition): assume 55 is less than 10% of all liquor stores in the city.
Compute the chi-square statistic (showing the idea through contributions):
- Region 1 contribution: approximately 1.024
- Region 2 contribution: approximately 1.148
- Region 3 contribution: approximately 4.009
- Region 4 contribution: approximately 0.082

$\chi^2 \approx 6.264$

Degrees of freedom:

$df = 4 - 1 = 3$

P-value (right-tail):

$P = P(\chi^2 > 6.264) = 0.099$
On a TI-84-style calculator, one way to compute is with chi2cdf(6.264, 1000, 3) = 0.099.
Using a built-in χ² GOF-Test by entering observed and expected lists can also return approximately $\chi^2 = 6.262$ and $P = 0.099$ .

Conclusion (linked to the p-value): because 0.099 > 0.05, fail to reject $H_0$ . There is not convincing evidence that liquor stores are distributed across the four regions in proportions different from the region area proportions.

What can go wrong in goodness-of-fit

Using observed proportions instead of counts (you can convert proportions back to counts if you know $n$ ).
Computing expected counts from sample proportions instead of the claimed proportions.
Forgetting to check expected counts before trusting the chi-square approximation.

Exam Focus

Typical question patterns:
- Given a claimed distribution and sample counts, compute expected counts and $\chi^2$ .
- Interpret computer output for a goodness-of-fit test (identify $df$ and what the p-value means).
- Decide whether categories should be combined due to small expected counts.
Common mistakes:
- Using $df = k$ rather than $k - 1$ .
- Computing expected counts from the sample proportions instead of the claimed proportions.
- Concluding which category caused significance without any follow-up reasoning (or claiming a single category “proved” different).

Chi-Square Test for Homogeneity (Comparing Distributions Across Groups)

What a homogeneity test asks

A chi-square test for homogeneity compares the distribution of a categorical variable across two or more populations or treatments. You use it when:

there is one categorical response variable
there are multiple groups (different populations or different treatments)
you want to know whether the category proportions are the same across groups

Typical data collection is separate random samples from each population, or random assignment to treatments followed by comparing the response distributions.

Homogeneity vs independence: why they look the same but mean different things

A homogeneity test and an independence test use the same mechanics on a two-way table. The difference is the design and the interpretation:

Homogeneity: separate samples from different populations or groups; compare distributions.
Independence: one sample from one population; assess association between two variables.

On AP questions, identifying which test to use often depends entirely on the wording about how the data were collected.

Hypotheses for homogeneity

Null hypothesis: the distribution of the categorical variable is the same in all groups.
Alternative hypothesis: at least one group has a different distribution.

Expected counts in a two-way table (used for homogeneity and independence)

For a table with row totals, column totals, and grand total $n$ :

$E_{ij} = \frac{(\text{row total}_i)(\text{column total}_j)}{n}$

Conditions

You should justify:

Random samples from each population (or random assignment to treatments)
Independence of observations within each group (and often the 10% condition if sampling without replacement)
Expected counts sufficiently large (commonly at least 5 in every cell)

A common mistake is to check observed counts; the condition is about expected counts.

Worked example: comparing distributions across two groups

A school wants to know whether preferred lunch option differs between 9th graders and 12th graders. They take separate random samples.

Observed counts:

Grade	Pizza	Salad	Sandwich	Total
9th	30	10	20	60
12th	20	25	15	60
Total	50	35	35	120

Hypotheses:

$H_0: \text{The distribution of lunch preference is the same for 9th and 12th graders}$

$H_a: \text{The distribution of lunch preference differs for 9th and 12th graders}$

Example expected counts (9th-Pizza):

$E = \frac{60 \cdot 50}{120} = 25$

Compute $\chi^2$ by summing all cell contributions; here:

$\chi^2 \approx 9.142$

Degrees of freedom:

$df = (2-1)(3-1) = 2$

A small right-tail p-value would provide evidence that lunch preferences are not distributed the same across grades.

Example 8.3: job satisfaction across employee groups

AP Statistics students take independent simple random samples of school employees in a large city (viewable as sampling within job-category strata):

100 teachers, 82 satisfied
60 administrators, 38 satisfied
45 custodians, 34 satisfied
55 secretaries, 36 satisfied

A natural two-way table is “Job category” by “Satisfied/Not satisfied.” Observed counts:

Job category	Satisfied	Not satisfied	Total
Teachers	82	18	100
Administrators	38	22	60
Custodians	34	11	45
Secretaries	36	19	55
Total	190	70	260

Hypotheses:

$H_0: \text{The true proportion satisfied is the same across all job categories}$

$H_a: \text{At least two job categories differ in the true proportion satisfied}$

Procedure: chi-square test for homogeneity.

Checks:

Independent SRSs from each group are stated.
10% condition is assumed to hold within each job category population.
Expected counts are all greater than 5. For example, expected satisfied teachers:

$E = \frac{100 \cdot 190}{260} \approx 73.077$

Technology approach: entering observed counts into a matrix and running a χ²-Test returns approximately:

$\chi^2 = 8.707$
$P = 0.0335$

Conclusion (linked to the p-value): because 0.0335 < 0.05, reject $H_0$ . There is convincing evidence that the true proportion of employees satisfied with their jobs is not the same across all job categories.

Interpretation: what the result actually means

Rejecting $H_0$ means the distributions differ across groups; it does not automatically tell you exactly which groups differ without follow-up analysis (for example, comparing conditional proportions or residuals). Also, unless job categories were randomly assigned (they are not), you should not interpret the result as showing a causal effect.

Exam Focus

Typical question patterns:
- Identify this as a homogeneity test based on “separate random samples from multiple groups.”
- Compute expected counts from margins and then compute $\chi^2$ .
- Interpret the conclusion as a statement about distributions across populations.
Common mistakes:
- Calling it an independence test when the design clearly uses multiple samples.
- Checking “all observed counts are at least 5” instead of expected counts.
- Writing hypotheses about individual cells instead of the overall distribution across groups.

Chi-Square Test for Independence (Association in a Two-Way Table)

What an independence test asks

A chi-square test for independence asks whether two categorical variables are associated in a single population. You use it when:

you take one random sample from one population
you record two categorical variables for each individual
you want to know whether the variables are independent (not related)

Independence here means the distribution of one variable is the same across the categories of the other variable.

Independence as a probability statement

Independence can be expressed as:

conditional distributions match overall distributions
equivalently, events satisfy the probability rule

$P(A \cap B) = P(A)P(B)$

Hypotheses for independence

$H_0: \text{The two categorical variables are independent in the population}$

$H_a: \text{The two categorical variables are associated in the population}$

Expected counts are computed the same way as homogeneity

Under independence:

$E_{ij} = \frac{(\text{row total}_i)(\text{column total}_j)}{n}$

Conditions

You should justify:

Random sample (or randomized experiment producing the table)
Independence of observations (often supported by the sampling method and the 10% condition)
Large expected counts (commonly at least 5 in every cell)

A frequent practical issue is sparse tables: lots of categories create many cells, and some expected counts drop below 5.

Worked example: association between screen time and sleep category

A random sample of 150 students reports daily screen time category and sleep category.

Observed counts:

Screen time	Under 7	7 to 9	Over 9	Total
Low	10	25	15	50
Medium	20	25	5	50
High	30	15	5	50
Total	60	65	25	150

Hypotheses:

$H_0: \text{Screen time category and sleep category are independent among students in the population sampled}$

$H_a: \text{Screen time category and sleep category are associated}$

Example expected count (Low, Under 7):

$E = \frac{50 \cdot 60}{150} = 20$

Computing all contributions gives:

$\chi^2 \approx 21.409$

Degrees of freedom:

$df = (3-1)(3-1) = 4$

A very small right-tail p-value would provide strong evidence of association.

Example 8.2: party affiliation and marijuana legalization support

In a nationwide telephone poll of 1000 randomly selected adults classified as Democrats, Republicans, and Independents, respondents were asked two questions: party affiliation and whether they support legalization of marijuana. The goal is to test whether support is independent of party affiliation at a 5% significance level.

Hypotheses:

$H_0: \text{Party affiliation and support for legalizing marijuana are independent}$

$H_a: \text{Party affiliation and support for legalizing marijuana are not independent}$

Mechanics: putting the observed counts into a matrix and running a χ²-Test (for example, on a TI-84, Casio Prizm, or HP Prime) gives approximately:

$\chi^2 = 94.5$
$P = 0.000$

The calculator also stores expected counts in a second matrix.

Conditions: a random sample is stated; $n = 1000$ is less than 10% of all adults; and all expected cell counts are greater than 5.

Conclusion (linked to the p-value): because 0.000 < 0.05, reject $H_0$ . There is sufficient evidence of a relationship between party affiliation and support for legalizing marijuana among adults.

Interpreting association carefully (including the key caution)

Even if you reject independence, you cannot necessarily claim any direct causal relationship. You can make a statement about a link or relationship between the two variables, but you are not justified in claiming that one causes the other unless the data came from a randomized experiment with random assignment.

Exam Focus

Typical question patterns:
- Identify an independence test from “one random sample, two categorical variables measured.”
- Compute expected counts and $\chi^2$ or interpret computer output.
- Interpret a significant result as “evidence of association,” not a specific causal story.
Common mistakes:
- Confusing independence with “equal counts in each cell.” Independence is about patterns of proportions, not equal numbers.
- Saying “the variables are dependent” without clarifying “associated in the population.”
- Concluding causation from an observational sample.

Interpreting Results, Follow-Up Analysis, and Practical Significance

What rejection and non-rejection really mean

After you compute a p-value, you decide relative to a significance level $\alpha$ .

If p-value < $\alpha$ , reject the null: the observed pattern is unlikely under the null model.
If p-value > $\alpha$ , fail to reject the null: you do not have convincing evidence against the null model.

Failing to reject is not the same as proving the null is true. The effect might be small, or the sample might be too small to detect it.

Why the chi-square test doesn’t directly tell you “where” the difference is

Because $\chi^2$ is a sum, the same statistic can come from one cell being very far from expected or many cells being moderately far from expected. When you reject $H_0$ , follow-up analysis is usually needed to explain what drove the result.

Cell contributions and residuals

Each cell’s contribution to $\chi^2$ is:

$\frac{(O - E)^2}{E}$

A more directional diagnostic is the standardized residual:

$\frac{O - E}{\sqrt{E}}$

Positive residuals mean observed > expected; negative residuals mean observed < expected. Larger absolute residuals suggest cells that drive the association or difference. Some technology outputs report residuals (often called Pearson residuals).

Follow-up analysis in context

If you reject independence in a study-method-by-pass/fail table, a good follow-up statement connects to observed-versus-expected patterns, such as: “Students using practice problems passed more often than expected under independence, while students rereading passed less often than expected.”

Practical significance vs statistical significance

A statistically significant chi-square result can occur with a huge sample even when differences in proportions are small. To judge practical importance, compare conditional proportions and consider real-world impact (costs, harms, benefits).

Example of interpreting conditional distributions

In an independence test, computing conditional proportions (the distribution of sleep categories within each screen time group, for example) helps describe the nature of the association and avoids misleading comparisons when group sizes differ.

Exam Focus

Typical question patterns:
- Interpret a significant chi-square result with a clear contextual conclusion.
- Use residuals or contributions to identify which cells drive significance.
- Compare conditional proportions to describe the nature of an association.
Common mistakes:
- Writing “accept $H_0$ ” instead of “fail to reject $H_0$ .”
- Claiming a significant result identifies the exact categories responsible without follow-up.
- Confusing statistical significance with practical importance.

Conditions, Warnings, and What to Do When Assumptions Fail

The three big condition themes

Across chi-square procedures, conditions fall into three buckets.

How were data produced? Random sample(s) or random assignment supports inference beyond the sample.
Are observations independent? Each individual should contribute to exactly one cell in the table.
Are expected counts large enough? Chi-square inference relies on an approximation that can break with sparse expected counts.

The expected count condition (and why it matters)

If expected counts are very small, the chi-square approximation can be inaccurate and p-values can be unreliable. A common AP Statistics rule is that all expected counts should be at least 5.

Fixing small expected counts: combining categories (pooling)

If some expected counts are too small, one remedy is to combine categories, but only when it is logically defensible (for example, combining “Strongly disagree” with “Disagree”). When pooling, you must:

recompute observed counts for the combined categories
recompute expected counts under the null
adjust degrees of freedom (fewer categories or cells)

Pooling should not be used as a way to “force” a procedure to work if it destroys the meaning of the question.

When pooling is not appropriate

Pooling may be inappropriate when categories are inherently meaningful or when so many cells are sparse that pooling would destroy interpretability. (Outside typical AP scope, alternative methods such as Fisher’s exact test can be used for small 2 by 2 tables.)

Independence of observations: the “one person, one cell” rule

Violations include counting multiple responses from the same person as separate observations or treating repeated-measures data as if they were independent. Lack of independence can make chi-square p-values misleading (often too small).

Sampling design matters for conclusions

With a random sample, you can generalize to the population sampled.
With random assignment, you may be able to argue causation about how treatment affects the distribution of a categorical response.
With convenience sampling, computations may be correct, but inference to a broader population is not justified.

Exam Focus

Typical question patterns:
- Check conditions and explicitly justify them in context.
- Decide whether pooling categories is appropriate and then recompute $df$ .
- Explain how the data collection method affects the scope of inference (generalization and causation).
Common mistakes:
- Checking expected counts using observed counts.
- Ignoring dependence caused by repeated measures or clustered sampling.
- Stating conclusions about “all people” when the sample was not random.

Using Technology and Writing AP-Style Conclusions

What technology output usually provides

Calculator or computer output for chi-square tests commonly includes:

the test statistic $\chi^2$
degrees of freedom $df$
p-value
sometimes expected counts and residuals

For goodness-of-fit, many calculators have a χ² GOF-Test where you enter observed and expected lists. For two-way tables (independence or homogeneity), a χ²-Test typically uses a matrix input and stores expected counts in a second matrix.

What you still must do (even with output)

Even if technology gives the p-value, you still must:

name the correct test (goodness-of-fit, homogeneity, independence)
state hypotheses in context
check and state conditions
make a conclusion in context that matches the design and scope of inference

Students often lose points more for missing these communication pieces than for arithmetic.

Writing a strong conclusion (template logic, not a memorized script)

A strong conclusion connects:

Decision (reject or fail to reject)
Strength of evidence (p-value relative to $\alpha$ )
The claim you are evaluating (null vs alternative)
Context and scope (population, variables, and whether causation is allowed)

Example phrasing for independence:

“Because the p-value is less than 0.05, we reject $H_0$ . There is convincing evidence that screen time category and sleep category are associated in the population of students from which the random sample was taken.”

Example phrasing for homogeneity:

“Because the p-value is greater than 0.05, we fail to reject $H_0$ . The data do not provide convincing evidence that lunch preference distributions differ between 9th and 12th graders at this school.”

Example phrasing for goodness-of-fit:

“We reject $H_0$ . The sample provides evidence that the candy color distribution in these bags does not match the company’s claimed proportions.”

Avoid words like “proved” or “guaranteed,” and avoid causal language unless random assignment justifies it.

Typical AP free-response expectations

On free-response, expect to do most or all of the following:

Identify the correct chi-square procedure and justify why it fits.
State hypotheses clearly in context.
Compute expected counts (and sometimes the test statistic).
Check conditions (randomness, independence, expected counts).
Use output or a chi-square table to obtain a p-value.
Conclude in context.
If significant, describe the nature of the difference/association (conditional proportions or residuals).

Common calculator/table pitfalls

Wrong $df$ entered leads to the wrong p-value.
Mixing up observed and expected lists in goodness-of-fit.
Rounding too early; keep reasonable precision during intermediate steps.

A quick recognition guide (conceptual)

One categorical variable, claimed proportions, one sample: goodness-of-fit.
One categorical response compared across multiple groups with separate samples or treatments: homogeneity.
Two categorical variables measured on one sample: independence.

Exam Focus

Typical question patterns:
- Interpret calculator/computer output: identify procedure, interpret $\chi^2$ , $df$ , and p-value.
- Write a complete conclusion including scope of inference and (if experimental) causation.
- Perform follow-up interpretation using conditional proportions or residuals.
Common mistakes:
- Giving a conclusion with no context (no variables or population mentioned).
- Forgetting to address conditions when asked for a “complete inference.”
- Stating causation from observational data or overgeneralizing beyond the sampling frame.