Unit 8: Inference for Categorical Data: Chi-Square

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/49

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 2:11 AM on 3/12/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

50 Terms

1
New cards

Categorical data

Data in which each observation falls into a label or group (e.g., party, brand, color, yes/no) rather than a numerical measurement.

2
New cards

Categorical inference

Using sample counts/proportions to judge whether an observed categorical pattern is plausibly due to chance variation under a proposed model.

3
New cards

Counts

The number of observations in each category/cell; the primary summary measure used in chi-square procedures.

4
New cards

Proportions

Counts divided by the total sample size; used to describe categorical distributions but chi-square calculations require counts.

5
New cards

Observed count (O)

The actual sample count in a category (one-way table) or cell (two-way table).

6
New cards

Expected count (E)

The count predicted in a category/cell if the null hypothesis model were true.

7
New cards

Null hypothesis (H0) in chi-square

A “blueprint” model specifying how counts should fall into categories in the long run (distribution, equal distributions across groups, or independence).

8
New cards

Alternative hypothesis (Ha) in chi-square

The claim that the observed categorical pattern does not match the null model (at least one proportion differs / distributions differ / variables are associated).

9
New cards

Goodness-of-fit test

A chi-square test that checks whether one categorical variable follows a claimed distribution of proportions in a population.

10
New cards

Test for homogeneity

A chi-square test that compares the distribution of one categorical response variable across two or more populations/treatments (based on separate samples or random assignment).

11
New cards

Test for independence

A chi-square test that evaluates whether two categorical variables are associated within one population (based on one sample measuring both variables).

12
New cards

One-way table

A table of counts for a single categorical variable across k categories (used in goodness-of-fit).

13
New cards

Two-way table

A table of counts for two categorical variables arranged in r rows and c columns (used in homogeneity or independence).

14
New cards

Margins (row/column totals)

The totals for each row and each column in a two-way table, used to compute expected counts.

15
New cards

Grand total (n)

The total number of observations in the table; used in expected-count formulas.

16
New cards

Chi-square distribution

A family of right-skewed distributions on nonnegative values used to model chi-square test statistics when H0 is true (approximately).

17
New cards

Right-skewed

A distribution shape with a long tail to the right; chi-square distributions are always right-skewed (less skewed as df increases).

18
New cards

Degrees of freedom (df)

The number of counts that can vary freely once totals and the null model are fixed; determines the chi-square distribution shape.

19
New cards

df for goodness-of-fit

For k categories in a one-way table, df = k − 1.

20
New cards

df for two-way tables

For an r×c table, df = (r − 1)(c − 1).

21
New cards

Chi-square test statistic (χ²)

A measure of overall mismatch between observed and expected counts: χ² = Σ (O − E)²/E.

22
New cards

Cell contribution to χ²

The amount a single category/cell adds to χ²: (O − E)²/E.

23
New cards

Standardized residual

A directional diagnostic for a cell: (O − E)/√E; positive means observed > expected, negative means observed < expected.

24
New cards

Pearson residual

Another name often used for standardized residuals in chi-square output: (O − E)/√E.

25
New cards

Right-tailed test (chi-square)

Chi-square p-values come from the right tail because only large χ² values indicate strong evidence against H0.

26
New cards

P-value (chi-square)

The probability, assuming H0 is true, of getting a χ² statistic at least as large as the one computed.

27
New cards

Significance level (α)

The cutoff probability used to decide whether to reject H0 (e.g., 0.05).

28
New cards

Reject H0

Decision made when p-value < α; conclude the data provide evidence against the null categorical model.

29
New cards

Fail to reject H0

Decision made when p-value > α; conclude there is not convincing evidence against the null model (not proof H0 is true).

30
New cards

Statistical significance

A result is statistically significant when the p-value is small enough to reject H0, indicating the pattern is unlikely under the null model.

31
New cards

Practical significance

Whether the size/impact of the observed differences matters in context; can differ from statistical significance, especially with large samples.

32
New cards

Chi-square approximation

Using a chi-square distribution to approximate the sampling distribution of χ² under H0; works better when expected counts are sufficiently large.

33
New cards

Large expected counts condition

A requirement for reliable chi-square inference; common AP rule of thumb: all expected counts are at least 5.

34
New cards

Random condition

A chi-square inference condition requiring data from a random sample (or random assignment in an experiment) to support broader inference.

35
New cards

Independence condition

A chi-square inference condition requiring observations to be independent; often supported by sampling design and the 10% condition.

36
New cards

10% condition

When sampling without replacement, the sample size should be less than 10% of the population to help justify independence.

37
New cards

One person, one cell rule

Each individual should contribute to exactly one cell in the table; violations (e.g., repeated measures) break independence and can mislead p-values.

38
New cards

Pooling (combining categories)

Combining categories to increase expected counts when some are too small; must be logically defensible and requires recomputing E and df.

39
New cards

Sparse table

A two-way table with many small expected counts (often from too many categories), which can make chi-square approximations unreliable.

40
New cards

Follow-up analysis (after significant χ²)

Additional work (e.g., residuals or conditional proportions) to determine which cells/categories drive the significant result.

41
New cards

Conditional distribution

Proportions of one variable within each category of another variable (e.g., sleep distribution within each screen-time group) used to describe associations.

42
New cards

Association (in chi-square independence)

A relationship where the distribution of one categorical variable differs across levels of another; evidence occurs when independence is rejected.

43
New cards

Independence (probability statement)

A and B are independent if P(A ∩ B) = P(A)P(B); in tables, it implies similar conditional distributions.

44
New cards

Expected count in goodness-of-fit

For category i with claimed proportion pi in sample size n, Ei = n·p_i.

45
New cards

Expected count in two-way tables

For cell (i,j), Eij = (row totali × column total_j)/n under H0 for homogeneity or independence.

46
New cards

Critical-value approach

A method that compares the computed χ² statistic to a cutoff from the chi-square distribution (with given df) instead of relying only on a p-value.

47
New cards

Scope of inference

What conclusions are justified based on design: random sampling supports generalization; random assignment supports causation.

48
New cards

Random assignment

Assigning individuals to treatments by chance; allows causal conclusions about how treatment affects a categorical response distribution.

49
New cards

Observational study

A study with no random assignment; can show association but does not justify causal conclusions.

50
New cards

AP-style conclusion

A conclusion that states reject/fail to reject, references p-value vs α, describes the claim in context (variables/population), and respects scope (no unwarranted causation).

Explore top notes

note
Notes
Updated 1187d ago
0.0(0)
note
Photons
Updated 900d ago
0.0(0)
note
Biology - Evolution
Updated 1476d ago
0.0(0)
note
RIse of Democracy Vocab Pt. 3
Updated 1499d ago
0.0(0)
note
Indirect Values
Updated 1499d ago
0.0(0)
note
Notes
Updated 1187d ago
0.0(0)
note
Photons
Updated 900d ago
0.0(0)
note
Biology - Evolution
Updated 1476d ago
0.0(0)
note
RIse of Democracy Vocab Pt. 3
Updated 1499d ago
0.0(0)
note
Indirect Values
Updated 1499d ago
0.0(0)

Explore top flashcards

flashcards
faf
40
Updated 957d ago
0.0(0)
flashcards
faf
40
Updated 957d ago
0.0(0)