Unit 8: Inference for Categorical Data: Chi-Square

0.0(0)

Studied by 0 people

0%Unit 8 Mastery

0%Exam Mastery

View linked note

Build your Mastery score

AP Practice

Supplemental Materials

Call Kai

Card Sorting

1/49

Earn XP

Description and Tags

AP Statistics

Unit 8: Inference for Categorical Data: Chi-Square

Last updated 2:11 AM on 3/12/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai	Chat

No analytics yet

Send a link to your students to track their progress

50 Terms

New cards

Categorical data

Data in which each observation falls into a label or group (e.g., party, brand, color, yes/no) rather than a numerical measurement.

New cards

Categorical inference

Using sample counts/proportions to judge whether an observed categorical pattern is plausibly due to chance variation under a proposed model.

New cards

Counts

The number of observations in each category/cell; the primary summary measure used in chi-square procedures.

New cards

Proportions

Counts divided by the total sample size; used to describe categorical distributions but chi-square calculations require counts.

New cards

Observed count (O)

The actual sample count in a category (one-way table) or cell (two-way table).

New cards

Expected count (E)

The count predicted in a category/cell if the null hypothesis model were true.

New cards

Null hypothesis (H0) in chi-square

A “blueprint” model specifying how counts should fall into categories in the long run (distribution, equal distributions across groups, or independence).

New cards

Alternative hypothesis (Ha) in chi-square

The claim that the observed categorical pattern does not match the null model (at least one proportion differs / distributions differ / variables are associated).

New cards

Goodness-of-fit test

A chi-square test that checks whether one categorical variable follows a claimed distribution of proportions in a population.

New cards

Test for homogeneity

A chi-square test that compares the distribution of one categorical response variable across two or more populations/treatments (based on separate samples or random assignment).

New cards

Test for independence

A chi-square test that evaluates whether two categorical variables are associated within one population (based on one sample measuring both variables).

New cards

One-way table

A table of counts for a single categorical variable across k categories (used in goodness-of-fit).

New cards

Two-way table

A table of counts for two categorical variables arranged in r rows and c columns (used in homogeneity or independence).

New cards

Margins (row/column totals)

The totals for each row and each column in a two-way table, used to compute expected counts.

New cards

Grand total (n)

The total number of observations in the table; used in expected-count formulas.

New cards

Chi-square distribution

A family of right-skewed distributions on nonnegative values used to model chi-square test statistics when $H_0$ is true (approximately).

New cards

Right-skewed

A distribution shape with a long tail to the right; chi-square distributions are always right-skewed (less skewed as df increases).

New cards

Degrees of freedom (df)

Using a chi-square distribution to approximate the sampling distribution of $\chi^2$ under $H_0$ ; works better when expected counts are sufficiently large.

New cards

df for goodness-of-fit

For $k$ categories in a one-way table, $df = k - 1.$

New cards

df for two-way tables

For an $r \times c$ table, $df = (r - 1)(c - 1).$

New cards

Chi-square test statistic (χ²)

A measure of overall mismatch between observed and expected counts: $\chi^2 = \sum (O - E)^2/E.$

New cards

Cell contribution to χ²

The amount a single category/cell adds to $\chi^2$ : $(O - E)^2/E.$

New cards

Standardized residual

A directional diagnostic for a cell: $(O - E)/\sqrt{E}$ ; positive means observed $>$ expected, negative means observed $<$ expected.

New cards

Pearson residual

Another name often used for standardized residuals in chi-square output: (O − E)/√E.

New cards

Right-tailed test (chi-square)

Chi-square p-values come from the right tail because only large χ² values indicate strong evidence against H0.

New cards

P-value (chi-square)

The probability, assuming $H_0$ is true, of getting a $\chi^2$ statistic at least as large as the one computed.

New cards

Significance level (α)

The cutoff probability used to decide whether to reject $H_0$ (e.g., 0.05).

New cards

Reject H0

Decision made when $p\text{-value} < \alpha$ ; conclude the data provide evidence against the null categorical model.

New cards

Fail to reject H0

Decision made when p-value $> \alpha$ ; conclude there is not convincing evidence against the null model (not proof $H_0$ is true).

New cards

Statistical significance

A result is statistically significant when the p-value is small enough to reject H0, indicating the pattern is unlikely under the null model.

New cards

Practical significance

Whether the size/impact of the observed differences matters in context; can differ from statistical significance, especially with large samples.

New cards

Chi-square approximation

Using a chi-square distribution to approximate the sampling distribution of χ² under H0; works better when expected counts are sufficiently large.

New cards

Large expected counts condition

A requirement for reliable chi-square inference; common AP rule of thumb: all expected counts are at least 5.

New cards

Random condition

A chi-square inference condition requiring data from a random sample (or random assignment in an experiment) to support broader inference.

New cards

Independence condition

A chi-square inference condition requiring observations to be independent; often supported by sampling design and the 10% condition.

New cards

10% condition

When sampling without replacement, the sample size should be less than 10% of the population to help justify independence.

New cards

One person, one cell rule

Each individual should contribute to exactly one cell in the table; violations (e.g., repeated measures) break independence and can mislead p-values.

New cards

Pooling (combining categories)

Combining categories to increase expected counts when some are too small; must be logically defensible and requires recomputing E and df.

New cards

Sparse table

A two-way table with many small expected counts (often from too many categories), which can make chi-square approximations unreliable.

New cards

Follow-up analysis (after significant χ²)

Additional work (e.g., residuals or conditional proportions) to determine which cells/categories drive the significant result.

New cards

Conditional distribution

Proportions of one variable within each category of another variable (e.g., sleep distribution within each screen-time group) used to describe associations.

New cards

Association (in chi-square independence)

A relationship where the distribution of one categorical variable differs across levels of another; evidence occurs when independence is rejected.

New cards

Independence (probability statement)

A and B are independent if $P(A \cap B) = P(A)P(B);$ in tables, it implies similar conditional distributions.

New cards

Expected count in goodness-of-fit

For category $i$ with claimed proportion $p_i$ in sample size $n,$ $E_i = n \cdot p_i.$

New cards

Expected count in two-way tables

For cell (i,j), $E_{ij} = \frac{(\text{row total}_i \times \text{column total}_j)}{n}$ under $H_0$ for homogeneity or independence.

New cards

Critical-value approach

A method that compares the computed $\chi^2$ statistic to a cutoff from the chi-square distribution (with given $df$ ) instead of relying only on a p-value.

New cards

Scope of inference

What conclusions are justified based on design: random sampling supports generalization; random assignment supports causation.

New cards

Random assignment

Assigning individuals to treatments by chance; allows causal conclusions about how treatment affects a categorical response distribution.

New cards

Observational study

A study with no random assignment; can show association but does not justify causal conclusions.

New cards

AP-style conclusion

A conclusion that states reject/fail to reject, references p-value vs α, describes the claim in context (variables/population), and respects scope (no unwarranted causation).