How to Do a Chi Square Test for AP Biology
What You Need to Know
What it is (and why AP Bio loves it)
A chi-square test (usually a goodness-of-fit test in AP Biology) checks whether your observed results are close enough to your expected results that any difference could reasonably be due to random chance.
You’ll use it most often for:
- Mendelian genetics crosses (do offspring counts fit a 3:1, 9:3:3:1, 1:2:1, etc.?)
- Hardy–Weinberg genotype frequencies (do observed genotypes match expected p^2 : 2pq : q^2?)
- Any categorical data where you have predicted proportions.
Key idea: Chi-square doesn’t “prove” your expected ratio is correct. It tests whether the deviation from expectation is small enough to be explained by chance.
The core formula
You compute the test statistic:
\chi^2 = \sum \frac{(O - E)^2}{E}
- O = observed count in a category
- E = expected count in that category
- Sum across all categories
Hypotheses (what you’re actually claiming)
- Null hypothesis H_0: Any difference between observed and expected is due to chance (the model fits).
- Alternative hypothesis H_A: The difference is not due to chance (the model does not fit).
Decision rule (AP Bio style)
You compare your \chi^2 value to a critical value from a chi-square table using:
- degrees of freedom df
- a significance level, usually \alpha = 0.05
If \chi^2 is **greater than or equal to** the critical value, you **reject** H_0.
If \chi^2 is **less than** the critical value, you **fail to reject** H_0.
“Fail to reject” is the correct wording (not “accept”).
Step-by-Step Breakdown
The full workflow (what to do on an FRQ)
State the hypotheses clearly.
- H_0: Observed counts match the expected ratio; deviations are due to chance.
- H_A: Observed counts do not match the expected ratio; deviations are not due to chance.
Write the expected ratio or expected probabilities.
Examples: 3:1, 9:3:3:1, p^2:2pq:q^2, etc.Compute expected counts for each category.
- Find total N.
- Convert ratio to proportions and multiply by N.
If ratio is a:b:c:... with sum S, then for each category:
E_i = \frac{\text{ratio part}_i}{S} \cdot NMake a quick table and calculate each term.
Use columns: category | O | E | O-E | (O-E)^2 | \frac{(O-E)^2}{E}.Sum to get \chi^2.
\chi^2 = \sum \frac{(O-E)^2}{E}Find degrees of freedom.
For goodness-of-fit:
df = k - 1
where k = number of categories (phenotypes or genotypes you’re counting).Choose \alpha (usually 0.05) and get the critical value.
Use the chi-square table at \alpha = 0.05 and your df.Compare and conclude in words (biological meaning).
- If \chi^2 \ge \chi^2_{critical}: reject H_0 → results do not fit expectation; something besides chance likely affected outcomes.
- If \chi^2 < \chi^2_{critical}: fail to reject H_0 → results fit expectation; deviations likely due to chance.
Mini worked walkthrough (annotated)
Suppose a monohybrid cross expects 3:1 purple:white. You observe O = 65 purple and O = 35 white.
- Total N = 100
- Expected counts: purple E = \frac{3}{4} \cdot 100 = 75; white E = \frac{1}{4} \cdot 100 = 25
Compute:
- Purple term: \frac{(65-75)^2}{75} = \frac{100}{75} = 1.33
- White term: \frac{(35-25)^2}{25} = \frac{100}{25} = 4.00
So:
\chi^2 = 1.33 + 4.00 = 5.33
Degrees of freedom: df = 2-1 = 1.
Critical value at \alpha = 0.05, df = 1 is 3.84.
Since 5.33 > 3.84, **reject** H_0.
Key Formulas, Rules & Facts
Formulas and definitions
| Item | Formula | When to use | Notes |
|---|---|---|---|
| Chi-square statistic | \chi^2 = \sum \frac{(O-E)^2}{E} | Goodness-of-fit for categorical counts | Bigger \chi^2 = bigger mismatch |
| Expected count from ratio | E_i = \frac{\text{ratio part}_i}{\sum \text{parts}} \cdot N | Mendelian ratios (e.g., 9:3:3:1) | Compute E before chi-square |
| Degrees of freedom | df = k - 1 | Chi-square goodness-of-fit | k = number of categories |
| Decision rule | Compare \chi^2 to critical value at df and \alpha | Most AP Bio problems | If \chi^2 \ge \chi^2_{critical} → reject H_0 |
Common critical values (most used on AP Bio)
(These are for \alpha = 0.05.)
| df | \chi^2_{critical} |
|---|---|
| 1 | 3.84 |
| 2 | 5.99 |
| 3 | 7.81 |
| 4 | 9.49 |
| 5 | 11.07 |
| 6 | 12.59 |
| 7 | 14.07 |
If your table gives ranges or multiple \alpha values, AP Bio typically expects \alpha = 0.05 unless stated otherwise.
Assumptions / conditions you should check
- Counts, not percentages (convert to counts if needed).
- Categories are mutually exclusive (each observation fits one category).
- Observations are independent (one offspring doesn’t determine another).
- Expected counts should not be too small; a common rule is each E \ge 5 (AP-level expectation: “expected values should be sufficiently large”).
Examples & Applications
Example 1: Monohybrid cross (fits expectation)
A cross predicts 3:1 phenotype ratio. Observed: purple 547, white 193.
- Total N = 740
- Expected: purple E = \frac{3}{4}\cdot 740 = 555; white E = \frac{1}{4}\cdot 740 = 185
Compute terms:
- Purple: \frac{(547-555)^2}{555} = \frac{64}{555} \approx 0.115
- White: \frac{(193-185)^2}{185} = \frac{64}{185} \approx 0.346
\chi^2 \approx 0.461
df = 2-1 = 1, critical = 3.84.
Since 0.461 < 3.84, **fail to reject** H_0 → data are consistent with 3:1.
Example 2: Dihybrid cross (fits expectation)
Expected ratio 9:3:3:1 for 4 phenotypes; total N = 160.
Observed counts: 90, 30, 28, 12.
Expected counts:
- 9/16\cdot160 = 90
- 3/16\cdot160 = 30
- 3/16\cdot160 = 30
- 1/16\cdot160 = 10
Compute only nonzero differences:
- Third category: \frac{(28-30)^2}{30} = \frac{4}{30} \approx 0.133
- Fourth category: \frac{(12-10)^2}{10} = \frac{4}{10} = 0.4
\chi^2 \approx 0.533
df = 4-1 = 3, critical = 7.81.
Since 0.533 < 7.81, **fail to reject** H_0.
Example 3: Monohybrid cross (reject expectation)
Observed: 65 dominant phenotype, 35 recessive phenotype; expected 3:1.
- N = 100
- Expected: 75 and 25
- \chi^2 = \frac{(65-75)^2}{75} + \frac{(35-25)^2}{25} = 1.33 + 4.00 = 5.33
- df = 1; critical 3.84
Since 5.33 > 3.84, **reject** H_0 → likely not a 3:1 outcome (or some non-random factor impacted results).
Example 4: Hardy–Weinberg genotype fit
A population has allele frequencies p = 0.6 and q = 0.4. Total individuals N = 200.
Expected genotypes:
- p^2 = 0.36 → E(AA) = 0.36\cdot200 = 72
- 2pq = 0.48 → E(Aa) = 96
- q^2 = 0.16 → E(aa) = 32
Observed genotypes: AA = 80, Aa = 70, aa = 50.
Compute:
- \frac{(80-72)^2}{72} = \frac{64}{72} \approx 0.889
- \frac{(70-96)^2}{96} = \frac{676}{96} \approx 7.042
- \frac{(50-32)^2}{32} = \frac{324}{32} = 10.125
\chi^2 \approx 18.056
df = 3-1 = 2; critical at df=2 is 5.99.
Since 18.056 > 5.99, **reject** H_0 → observed genotype frequencies significantly deviate from HW expectations.
Common Mistakes & Traps
Mixing up observed and expected
- Wrong: Plugging observed values into the expected column or vice versa.
- Why it matters: The whole statistic is based on differences O-E.
- Fix: Always compute E from the _ratio/probability_ first, then compare to O.
Using the ratio numbers as expected counts without scaling
- Wrong: Treating 9:3:3:1 as expected counts 9,3,3,1 even when N \ne 16.
- Fix: Convert ratio to fractions of the total and multiply by N.
Forgetting to square the difference (or squaring after dividing)
- Wrong: Using \frac{O-E}{E} or doing \left(\frac{O-E}{E}\right)^2.
- Correct: \frac{(O-E)^2}{E} (square first, then divide).
Incorrect degrees of freedom
- Wrong: Using df = N-1 (sample size) or df = \text{#traits}-1.
- Correct: df = k-1 where k is the number of categories you actually have counts for.
Saying “accept the null”
- Wrong: “We accept H_0.”
- Why it’s wrong: Statistics doesn’t prove H_0; it only assesses evidence against it.
- Fix: Say fail to reject H_0.
Using the wrong chi-square table column (wrong \alpha)
- Wrong: Comparing to a 0.01 column when the problem expects 0.05.
- Fix: Default to \alpha = 0.05 unless the prompt specifies otherwise.
Rounding too aggressively mid-calculation
- Wrong: Rounding each term heavily (e.g., to whole numbers).
- Fix: Keep a few decimals for each term; round at the end.
Ignoring small expected counts
- Issue: If some E values are very small, the chi-square approximation becomes less reliable.
- Fix (AP-level): Note it as a limitation, or (if allowed) combine rare categories logically.
Memory Aids & Quick Tricks
| Trick / mnemonic | What it helps you remember | When to use it |
|---|---|---|
| “O–E, square, over E, then sum” | The exact structure of \frac{(O-E)^2}{E} | Any chi-square computation |
| “df = boxes − 1” | df = k-1 where k = number of categories | Choosing the right row in the table |
| “Big chi = bye null” | Large \chi^2 means poor fit → reject H_0 | Interpreting results quickly |
| Ratio → fractions → counts | Convert expected ratios to expected counts correctly | Genetics crosses with a ratio |
| Table-first habit | Prevents arithmetic/sign errors by organizing terms | FRQs and multi-category problems |
Quick Review Checklist
- You can state H_0 (chance explains differences) and H_A (chance doesn’t).
- You can compute expected counts using E_i = \frac{\text{part}}{\text{sum}}\cdot N.
- You can calculate \chi^2 = \sum \frac{(O-E)^2}{E} correctly (square first).
- You can find df = k-1 using the number of categories.
- You can use \alpha = 0.05 and compare to the correct critical value.
- You conclude using: reject H_0 or **fail to reject** H_0 (in context).
- You check that expected counts are reasonably large (ideally E \ge 5).
You’ve got this—if you can set up the O/E table cleanly, the rest is just careful arithmetic.