How to Do a Chi Square Test for AP Biology

What You Need to Know

What it is (and why AP Bio loves it)

A chi-square test (usually a goodness-of-fit test in AP Biology) checks whether your observed results are close enough to your expected results that any difference could reasonably be due to random chance.

You’ll use it most often for:

Mendelian genetics crosses (do offspring counts fit a $3:1$ , $9:3:3:1$ , $1:2:1$ , etc.?)
Hardy–Weinberg genotype frequencies (do observed genotypes match expected $p^2 : 2pq : q^2$ ?)
Any categorical data where you have predicted proportions.

Key idea: Chi-square doesn’t “prove” your expected ratio is correct. It tests whether the deviation from expectation is small enough to be explained by chance.

The core formula

You compute the test statistic:

$\chi^2 = \sum \frac{(O - E)^2}{E}$

$O$ = observed count in a category
$E$ = expected count in that category
Sum across all categories

Hypotheses (what you’re actually claiming)

Null hypothesis $H_0$ : Any difference between observed and expected is due to chance (the model fits).
Alternative hypothesis $H_A$ : The difference is not due to chance (the model does not fit).

Decision rule (AP Bio style)

You compare your $\chi^2$ value to a critical value from a chi-square table using:

degrees of freedom $df$
a significance level, usually $\alpha = 0.05$

If $\chi^2$ is **greater than or equal to** the critical value, you **reject** $H_0$ .
If $\chi^2$ is **less than** the critical value, you **fail to reject** $H_0$ .

“Fail to reject” is the correct wording (not “accept”).

Step-by-Step Breakdown

The full workflow (what to do on an FRQ)

State the hypotheses clearly.
- $H_0$ : Observed counts match the expected ratio; deviations are due to chance.
- $H_A$ : Observed counts do not match the expected ratio; deviations are not due to chance.
Write the expected ratio or expected probabilities.
Examples: $3:1$ , $9:3:3:1$ , $p^2:2pq:q^2$ , etc.
Compute expected counts for each category.
- Find total $N$ .
- Convert ratio to proportions and multiply by $N$ .
If ratio is $a:b:c:...$ with sum $S$ , then for each category:
$E_i = \frac{\text{ratio part}_i}{S} \cdot N$
Make a quick table and calculate each term.
Use columns: category | $O$ | $E$ | $O-E$ | $(O-E)^2$ | $\frac{(O-E)^2}{E}$ .
Sum to get $\chi^2$ .
$\chi^2 = \sum \frac{(O-E)^2}{E}$
Find degrees of freedom.
For goodness-of-fit:
$df = k - 1$
where $k$ = number of categories (phenotypes or genotypes you’re counting).
Choose $\alpha$ (usually $0.05$ ) and get the critical value.
Use the chi-square table at $\alpha = 0.05$ and your $df$ .
Compare and conclude in words (biological meaning).
- If $\chi^2 \ge \chi^2_{critical}$ : reject $H_0$ → results do not fit expectation; something besides chance likely affected outcomes.
- If $\chi^2 < \chi^2_{critical}$ : fail to reject $H_0$ → results fit expectation; deviations likely due to chance.

Mini worked walkthrough (annotated)

Suppose a monohybrid cross expects $3:1$ purple:white. You observe $O = 65$ purple and $O = 35$ white.

Total $N = 100$
Expected counts: purple $E = \frac{3}{4} \cdot 100 = 75$ ; white $E = \frac{1}{4} \cdot 100 = 25$

Compute:

Purple term: $\frac{(65-75)^2}{75} = \frac{100}{75} = 1.33$
White term: $\frac{(35-25)^2}{25} = \frac{100}{25} = 4.00$

So:
$\chi^2 = 1.33 + 4.00 = 5.33$

Degrees of freedom: $df = 2-1 = 1$ .
Critical value at $\alpha = 0.05$ , $df = 1$ is $3.84$ .

Since $5.33 > 3.84$ , **reject** $H_0$ .

Key Formulas, Rules & Facts

Formulas and definitions

Item	Formula	When to use	Notes
Chi-square statistic	$\chi^2 = \sum \frac{(O-E)^2}{E}$	Goodness-of-fit for categorical counts	Bigger $\chi^2$ = bigger mismatch
Expected count from ratio	$E_i = \frac{\text{ratio part}_i}{\sum \text{parts}} \cdot N$	Mendelian ratios (e.g., $9:3:3:1$ )	Compute $E$ before chi-square
Degrees of freedom	$df = k - 1$	Chi-square goodness-of-fit	$k$ = number of categories
Decision rule	Compare $\chi^2$ to critical value at $df$ and $\alpha$	Most AP Bio problems	If $\chi^2 \ge \chi^2_{critical}$ → reject $H_0$

Common critical values (most used on AP Bio)

(These are for $\alpha = 0.05$ .)

$df$	$\chi^2_{critical}$
$1$	$3.84$
$2$	$5.99$
$3$	$7.81$
$4$	$9.49$
$5$	$11.07$
$6$	$12.59$
$7$	$14.07$

If your table gives ranges or multiple $\alpha$ values, AP Bio typically expects $\alpha = 0.05$ unless stated otherwise.

Assumptions / conditions you should check

Counts, not percentages (convert to counts if needed).
Categories are mutually exclusive (each observation fits one category).
Observations are independent (one offspring doesn’t determine another).
Expected counts should not be too small; a common rule is each $E \ge 5$ (AP-level expectation: “expected values should be sufficiently large”).

Examples & Applications

Example 1: Monohybrid cross (fits expectation)

A cross predicts $3:1$ phenotype ratio. Observed: purple $547$ , white $193$ .

Total $N = 740$
Expected: purple $E = \frac{3}{4}\cdot 740 = 555$ ; white $E = \frac{1}{4}\cdot 740 = 185$

Compute terms:

Purple: $\frac{(547-555)^2}{555} = \frac{64}{555} \approx 0.115$
White: $\frac{(193-185)^2}{185} = \frac{64}{185} \approx 0.346$

$\chi^2 \approx 0.461$

$df = 2-1 = 1$ , critical $= 3.84$ .
Since $0.461 < 3.84$ , **fail to reject** $H_0$ → data are consistent with $3:1$ .

Example 2: Dihybrid cross (fits expectation)

Expected ratio $9:3:3:1$ for 4 phenotypes; total $N = 160$ .
Observed counts: $90, 30, 28, 12$ .

Expected counts:

$9/16\cdot160 = 90$
$3/16\cdot160 = 30$
$3/16\cdot160 = 30$
$1/16\cdot160 = 10$

Compute only nonzero differences:

Third category: $\frac{(28-30)^2}{30} = \frac{4}{30} \approx 0.133$
Fourth category: $\frac{(12-10)^2}{10} = \frac{4}{10} = 0.4$

$\chi^2 \approx 0.533$

$df = 4-1 = 3$ , critical $= 7.81$ .
Since $0.533 < 7.81$ , **fail to reject** $H_0$ .

Example 3: Monohybrid cross (reject expectation)

Observed: $65$ dominant phenotype, $35$ recessive phenotype; expected $3:1$ .

$N = 100$
Expected: $75$ and $25$
$\chi^2 = \frac{(65-75)^2}{75} + \frac{(35-25)^2}{25} = 1.33 + 4.00 = 5.33$
$df = 1$ ; critical $3.84$

Since $5.33 > 3.84$ , **reject** $H_0$ → likely not a $3:1$ outcome (or some non-random factor impacted results).

Example 4: Hardy–Weinberg genotype fit

A population has allele frequencies $p = 0.6$ and $q = 0.4$ . Total individuals $N = 200$ .
Expected genotypes:

$p^2 = 0.36$ → $E(AA) = 0.36\cdot200 = 72$
$2pq = 0.48$ → $E(Aa) = 96$
$q^2 = 0.16$ → $E(aa) = 32$

Observed genotypes: $AA = 80$ , $Aa = 70$ , $aa = 50$ .

Compute:

$\frac{(80-72)^2}{72} = \frac{64}{72} \approx 0.889$
$\frac{(70-96)^2}{96} = \frac{676}{96} \approx 7.042$
$\frac{(50-32)^2}{32} = \frac{324}{32} = 10.125$

$\chi^2 \approx 18.056$

$df = 3-1 = 2$ ; critical at $df=2$ is $5.99$ .
Since $18.056 > 5.99$ , **reject** $H_0$ → observed genotype frequencies significantly deviate from HW expectations.

Common Mistakes & Traps

Mixing up observed and expected
- Wrong: Plugging observed values into the expected column or vice versa.
- Why it matters: The whole statistic is based on differences $O-E$ .
- Fix: Always compute $E$ from the _ratio/probability_ first, then compare to $O$ .
Using the ratio numbers as expected counts without scaling
- Wrong: Treating $9:3:3:1$ as expected counts $9,3,3,1$ even when $N \ne 16$ .
- Fix: Convert ratio to fractions of the total and multiply by $N$ .
Forgetting to square the difference (or squaring after dividing)
- Wrong: Using $\frac{O-E}{E}$ or doing $\left(\frac{O-E}{E}\right)^2$ .
- Correct: $\frac{(O-E)^2}{E}$ (square first, then divide).
Incorrect degrees of freedom
- Wrong: Using $df = N-1$ (sample size) or df = \text{#traits}-1.
- Correct: $df = k-1$ where $k$ is the number of categories you actually have counts for.
Saying “accept the null”
- Wrong: “We accept $H_0$ .”
- Why it’s wrong: Statistics doesn’t prove $H_0$ ; it only assesses evidence against it.
- Fix: Say fail to reject $H_0$ .
Using the wrong chi-square table column (wrong $\alpha$ )
- Wrong: Comparing to a $0.01$ column when the problem expects $0.05$ .
- Fix: Default to $\alpha = 0.05$ unless the prompt specifies otherwise.
Rounding too aggressively mid-calculation
- Wrong: Rounding each term heavily (e.g., to whole numbers).
- Fix: Keep a few decimals for each term; round at the end.
Ignoring small expected counts
- Issue: If some $E$ values are very small, the chi-square approximation becomes less reliable.
- Fix (AP-level): Note it as a limitation, or (if allowed) combine rare categories logically.

Memory Aids & Quick Tricks

Trick / mnemonic	What it helps you remember	When to use it
“O–E, square, over E, then sum”	The exact structure of $\frac{(O-E)^2}{E}$	Any chi-square computation
“df = boxes − 1”	$df = k-1$ where $k$ = number of categories	Choosing the right row in the table
“Big chi = bye null”	Large $\chi^2$ means poor fit → reject $H_0$	Interpreting results quickly
Ratio → fractions → counts	Convert expected ratios to expected counts correctly	Genetics crosses with a ratio
Table-first habit	Prevents arithmetic/sign errors by organizing terms	FRQs and multi-category problems

Quick Review Checklist

You can state $H_0$ (chance explains differences) and $H_A$ (chance doesn’t).
You can compute expected counts using $E_i = \frac{\text{part}}{\text{sum}}\cdot N$ .
You can calculate $\chi^2 = \sum \frac{(O-E)^2}{E}$ correctly (square first).
You can find $df = k-1$ using the number of categories.
You can use $\alpha = 0.05$ and compare to the correct critical value.
You conclude using: reject $H_0$ or **fail to reject** $H_0$ (in context).
You check that expected counts are reasonably large (ideally $E \ge 5$ ).

You’ve got this—if you can set up the $O/E$ table cleanly, the rest is just careful arithmetic.