1/49
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Categorical data
Data in which each observation falls into a label or group (e.g., party, brand, color, yes/no) rather than a numerical measurement.
Categorical inference
Using sample counts/proportions to judge whether an observed categorical pattern is plausibly due to chance variation under a proposed model.
Counts
The number of observations in each category/cell; the primary summary measure used in chi-square procedures.
Proportions
Counts divided by the total sample size; used to describe categorical distributions but chi-square calculations require counts.
Observed count (O)
The actual sample count in a category (one-way table) or cell (two-way table).
Expected count (E)
The count predicted in a category/cell if the null hypothesis model were true.
Null hypothesis (H0) in chi-square
A “blueprint” model specifying how counts should fall into categories in the long run (distribution, equal distributions across groups, or independence).
Alternative hypothesis (Ha) in chi-square
The claim that the observed categorical pattern does not match the null model (at least one proportion differs / distributions differ / variables are associated).
Goodness-of-fit test
A chi-square test that checks whether one categorical variable follows a claimed distribution of proportions in a population.
Test for homogeneity
A chi-square test that compares the distribution of one categorical response variable across two or more populations/treatments (based on separate samples or random assignment).
Test for independence
A chi-square test that evaluates whether two categorical variables are associated within one population (based on one sample measuring both variables).
One-way table
A table of counts for a single categorical variable across k categories (used in goodness-of-fit).
Two-way table
A table of counts for two categorical variables arranged in r rows and c columns (used in homogeneity or independence).
Margins (row/column totals)
The totals for each row and each column in a two-way table, used to compute expected counts.
Grand total (n)
The total number of observations in the table; used in expected-count formulas.
Chi-square distribution
A family of right-skewed distributions on nonnegative values used to model chi-square test statistics when H0 is true (approximately).
Right-skewed
A distribution shape with a long tail to the right; chi-square distributions are always right-skewed (less skewed as df increases).
Degrees of freedom (df)
The number of counts that can vary freely once totals and the null model are fixed; determines the chi-square distribution shape.
df for goodness-of-fit
For k categories in a one-way table, df = k − 1.
df for two-way tables
For an r×c table, df = (r − 1)(c − 1).
Chi-square test statistic (χ²)
A measure of overall mismatch between observed and expected counts: χ² = Σ (O − E)²/E.
Cell contribution to χ²
The amount a single category/cell adds to χ²: (O − E)²/E.
Standardized residual
A directional diagnostic for a cell: (O − E)/√E; positive means observed > expected, negative means observed < expected.
Pearson residual
Another name often used for standardized residuals in chi-square output: (O − E)/√E.
Right-tailed test (chi-square)
Chi-square p-values come from the right tail because only large χ² values indicate strong evidence against H0.
P-value (chi-square)
The probability, assuming H0 is true, of getting a χ² statistic at least as large as the one computed.
Significance level (α)
The cutoff probability used to decide whether to reject H0 (e.g., 0.05).
Reject H0
Decision made when p-value < α; conclude the data provide evidence against the null categorical model.
Fail to reject H0
Decision made when p-value > α; conclude there is not convincing evidence against the null model (not proof H0 is true).
Statistical significance
A result is statistically significant when the p-value is small enough to reject H0, indicating the pattern is unlikely under the null model.
Practical significance
Whether the size/impact of the observed differences matters in context; can differ from statistical significance, especially with large samples.
Chi-square approximation
Using a chi-square distribution to approximate the sampling distribution of χ² under H0; works better when expected counts are sufficiently large.
Large expected counts condition
A requirement for reliable chi-square inference; common AP rule of thumb: all expected counts are at least 5.
Random condition
A chi-square inference condition requiring data from a random sample (or random assignment in an experiment) to support broader inference.
Independence condition
A chi-square inference condition requiring observations to be independent; often supported by sampling design and the 10% condition.
10% condition
When sampling without replacement, the sample size should be less than 10% of the population to help justify independence.
One person, one cell rule
Each individual should contribute to exactly one cell in the table; violations (e.g., repeated measures) break independence and can mislead p-values.
Pooling (combining categories)
Combining categories to increase expected counts when some are too small; must be logically defensible and requires recomputing E and df.
Sparse table
A two-way table with many small expected counts (often from too many categories), which can make chi-square approximations unreliable.
Follow-up analysis (after significant χ²)
Additional work (e.g., residuals or conditional proportions) to determine which cells/categories drive the significant result.
Conditional distribution
Proportions of one variable within each category of another variable (e.g., sleep distribution within each screen-time group) used to describe associations.
Association (in chi-square independence)
A relationship where the distribution of one categorical variable differs across levels of another; evidence occurs when independence is rejected.
Independence (probability statement)
A and B are independent if P(A ∩ B) = P(A)P(B); in tables, it implies similar conditional distributions.
Expected count in goodness-of-fit
For category i with claimed proportion pi in sample size n, Ei = n·p_i.
Expected count in two-way tables
For cell (i,j), Eij = (row totali × column total_j)/n under H0 for homogeneity or independence.
Critical-value approach
A method that compares the computed χ² statistic to a cutoff from the chi-square distribution (with given df) instead of relying only on a p-value.
Scope of inference
What conclusions are justified based on design: random sampling supports generalization; random assignment supports causation.
Random assignment
Assigning individuals to treatments by chance; allows causal conclusions about how treatment affects a categorical response distribution.
Observational study
A study with no random assignment; can show association but does not justify causal conclusions.
AP-style conclusion
A conclusion that states reject/fail to reject, references p-value vs α, describes the claim in context (variables/population), and respects scope (no unwarranted causation).