ACT Math Statistics & Probability: Learning Notes with Worked Examples

Center and Spread of Distributions

When you collect numerical data (test scores, heights, reaction times), you usually want two big pieces of information:

Center: a “typical” value—where the data tends to sit.
Spread: how far the data typically varies around that center.

Thinking in terms of center and spread helps you compare groups quickly and makes graphs (like histograms and boxplots) meaningful rather than just “pictures of numbers.”

What “distribution” means

A distribution is the pattern of values a variable takes and how often each value occurs. You can see a distribution in:

a list of numbers
a frequency table
a histogram/dotplot
a boxplot

ACT questions often give you one of these representations and ask you to interpret center/spread or compare two distributions.

Measures of center

Mean (average) is sensitive to extreme values (outliers). If the data has one unusually large or small value, the mean gets pulled toward it.

If the data values are $x_1, x_2, \dots, x_n$ , the mean is:

$\bar{x} = \frac{x_1 + x_2 + \cdots + x_n}{n}$

$\bar{x}$ is the sample mean.
$n$ is the number of data points.

Median is the middle value when data is ordered. It is resistant to outliers.

If $n$ is odd: the median is the single middle value.
If $n$ is even: the median is the average of the two middle values.

Mode is the most frequent value. It’s less emphasized on ACT Math but sometimes appears in “which measure changes” questions.

Why it matters: If a distribution is skewed (has a long tail), the median often represents “typical” better than the mean.

Measures of spread

Spread describes variability.

Range is the simplest measure:

$\text{range} = \text{max} - \text{min}$

Range is very sensitive to outliers because it uses only the two most extreme values.

Interquartile range (IQR) measures the spread of the middle 50%:

$\text{IQR} = Q_3 - Q_1$

$Q_1$ is the first quartile (25th percentile)
$Q_3$ is the third quartile (75th percentile)

IQR is resistant to outliers and connects naturally to boxplots.

Standard deviation measures typical distance from the mean. On ACT, you’re more likely to interpret standard deviation than compute it from scratch, but the idea is important:

Larger standard deviation means data points are more spread out.
Smaller standard deviation means data points cluster near the mean.

Formulas (mostly for understanding or if provided):

Population standard deviation:

$\sigma = \sqrt{\frac{\sum (x - \mu)^2}{N}}$

Sample standard deviation:

$s = \sqrt{\frac{\sum (x - \bar{x})^2}{n - 1}}$

$\mu$ is population mean, $N$ is population size.
$\bar{x}$ is sample mean, $n$ is sample size.

Shape, skew, and outliers

Skewed right means the right tail is longer (some large values). Typically:

mean > median

Skewed left means the left tail is longer (some small values). Typically:

mean < median

Outliers are values unusually far from the rest. A common rule tied to boxplots is the 1.5 IQR rule:

lower fence: $Q_1 - 1.5\cdot \text{IQR}$
upper fence: $Q_3 + 1.5\cdot \text{IQR}$

Values outside these fences are often flagged as outliers.

What goes wrong: Students often choose mean as “best center” automatically. If the data are skewed or have an outlier, the median is often a better “typical value.”

Boxplots: reading center and spread visually

A boxplot shows:

minimum
$Q_1$
median
$Q_3$
maximum

The length of the box is the IQR. A longer box means more variability in the middle half of the data.

Worked examples

Example 1 (mean vs median with an outlier).
Data: 4, 5, 5, 6, 6, 6, 30

Mean:

$\bar{x} = \frac{4+5+5+6+6+6+30}{7} = \frac{62}{7} \approx 8.86$

Median: the 4th value (ordered list has 7 items) is 6.

The mean is much larger because 30 pulls it upward. The median better matches the “typical” cluster around 5–6.

Example 2 (IQR from quartiles).
Suppose $Q1 = 12$ and $Q3 = 20$ .

$\text{IQR} = 20 - 12 = 8$

So the middle half of the data spans 8 units.

Exam Focus

Typical question patterns:
- Compare two groups using mean/median and IQR/range from a boxplot.
- Decide which measure of center is more appropriate given skew/outliers.
- Predict how mean/median changes if a value is added or changed.
Common mistakes:
- Using the mean when the problem context suggests outliers (income, home prices) where median is more representative.
- Confusing IQR with range (IQR is middle 50%, range is max minus min).
- Reading a boxplot incorrectly (the box is not “most of the data,” it’s the middle 50%).

Data Collection Methods and Bias

Statistics isn’t only about calculating numbers; it’s also about whether the numbers are trustworthy. How the data are collected determines what you’re allowed to conclude.

Populations, samples, and parameters

A population is the entire group you care about (all students in a school, all voters in a state). A sample is the subset you actually measure.

A parameter describes a population (like the true population mean $\mu$ ). A **statistic** describes a sample (like the sample mean $\bar{x}$ ). The big idea is that you use sample statistics to estimate population parameters.

Sampling methods

A good sample should be representative—it should look like the population in the ways that matter.

Simple random sample (SRS) means every individual has an equal chance of being selected.

Stratified sample: split the population into groups (strata) that matter (grade level, region), then randomly sample within each group. This is helpful when you want to ensure each subgroup is represented.

Cluster sample: split into clusters (like classrooms), randomly choose some clusters, then sample everyone in chosen clusters. This can be cheaper but risks picking unrepresentative clusters.

Convenience sample: sample whoever is easy to reach. This is common in real life and often biased.

Why it matters: ACT questions often test whether a conclusion is valid based on the sampling method. “Random” isn’t a buzzword; it’s what allows generalization.

Bias: what it is and how it sneaks in

Bias is a systematic error that pushes results away from the truth.

Common types:

Selection bias: some groups are more likely to be included than others (surveying only people at a gym to estimate exercise habits).
Nonresponse bias: selected participants don’t respond, and nonresponders differ from responders.
Response bias: people lie or misremember (especially on sensitive questions).
Wording bias: leading questions push respondents.

A key skill is identifying what group is missing or overrepresented and how that would affect results.

Observational studies vs experiments

In an observational study, you observe outcomes without assigning treatments (for example, record sleep hours and GPA). Observational studies can show association (two variables move together) but cannot establish causation reliably.

In an experiment, researchers assign a treatment to subjects (new study method vs old study method). Experiments can support causal conclusions when well-designed.

Key experiment features:

Control group: baseline comparison.
Random assignment: subjects assigned to groups randomly; this reduces confounding.
Blinding: subjects (single-blind) and/or researchers (double-blind) don’t know group assignments; this reduces expectation effects.

Confounding variable: a variable related to both the explanatory variable and the response that can create misleading conclusions.

What goes wrong: Students often confuse “random sample” (for generalizing to a population) with “random assignment” (for cause-and-effect within an experiment). They solve different problems.

Worked examples

Example 1 (detecting selection bias).
A teacher wants to know average time students spend on homework. They ask students who attend after-school tutoring.

This likely overestimates homework time because tutoring attendees may spend more time on schoolwork than the typical student.

Example 2 (causation vs correlation).
A study finds that students who drink more water score higher on exams.

This is observational. A possible confounder is overall health habits (students with healthier routines might both drink water and study more). You cannot conclude water causes higher scores without an experiment.

Exam Focus

Typical question patterns:
- Identify whether a study is observational or experimental and what conclusions are justified.
- Spot the type of bias in a survey description.
- Choose an improved sampling method (often “random” or “stratified”).
Common mistakes:
- Claiming causation from an observational study.
- Thinking a large convenience sample eliminates bias (size does not fix bias).
- Mixing up random sampling with random assignment.

Bivariate Data and Scatterplots

So far, a distribution describes one variable. Bivariate data involves pairs of values—each individual contributes two measurements (like height and weight). The goal is to understand the relationship between the variables.

What a scatterplot shows

A scatterplot graphs paired data $(x, y)$ .

$x$ is often the explanatory (independent) variable.
$y$ is often the response (dependent) variable.

A scatterplot helps you see:

direction: positive (upward trend) or negative (downward trend)
form: linear, curved, or no clear pattern
strength: how tightly points cluster around a pattern
outliers: points far from the general trend

Why it matters: Many ACT questions are visual. You may be asked which scatterplot matches a description, or whether a linear model is reasonable.

Correlation: a number for linear strength

Correlation (usually denoted $r$ ) measures the strength and direction of a linear relationship.

$r$ is between $-1$ and $1$ .
r > 0: positive association
r < 0: negative association
$r \approx 0$ : weak linear association

Important limitations:

Correlation does not imply causation.
Correlation can be misleading if the relationship is nonlinear (a curved pattern can have $r \approx 0$ even when variables are strongly related).
Outliers can greatly change correlation.

Interpreting slope visually (rate of change)

Even before regression, you can interpret steepness:

A steeper upward trend means $y$ increases more per unit of $x$ .
A negative trend means $y$ decreases as $x$ increases.

This intuition supports later work with regression lines.

Worked examples

Example 1 (interpret a scatterplot description).
If you’re told: “As temperature increases, ice cream sales increase, and the points lie close to a line,” you should expect:

positive association
strong relationship
roughly linear form

So $r$ would be positive and closer to 1 than to 0.

Example 2 (recognizing nonlinearity).
Suppose a scatterplot of braking distance vs speed curves upward (distance increases faster at higher speeds). A linear model may systematically underpredict at high speeds and overpredict at low speeds. That’s a sign you might need a curved model (such as quadratic).

Exam Focus

Typical question patterns:
- Decide whether association is positive/negative/none from a scatterplot.
- Identify outliers and describe their effect on a trend.
- Judge whether a linear model is appropriate or whether curvature suggests another model.
Common mistakes:
- Treating correlation as “how related” in general (it measures linear association only).
- Ignoring outliers that clearly break the pattern.
- Confusing steep slope with strong correlation (strength is about scatter around the pattern, not steepness).

Linear and Quadratic Regression

Once you see a relationship in bivariate data, you often want a model to predict $y$ from $x$ . Regression is the process of finding an equation that best fits the data.

On the ACT, you’re often given the regression equation or asked to interpret it, use it for prediction, or decide which model fits better.

Linear regression: fitting a line

A linear regression model uses a line:

$y = mx + b$

$m$ is the slope (predicted change in $y$ for a 1-unit increase in $x$ ).
$b$ is the $y$ -intercept (predicted $y$ when $x = 0$ ).

Why it matters: A line is a simple, powerful model, but it only makes sense if the scatterplot is roughly linear.

Residuals: how far predictions miss

A residual is:

$\text{residual} = y - \hat{y}$

$y$ is the actual observed value.
$\hat{y}$ is the predicted value from the model.

Residuals tell you whether the model fits well:

small residuals (in magnitude) indicate good fit
a pattern in residuals (like a curve) suggests the model is missing a nonlinear relationship

Interpolation vs extrapolation

Interpolation predicts within the range of observed $x$ values.

Extrapolation predicts beyond the observed range. Extrapolation is risky because the relationship may change outside the data range.

ACT questions sometimes test whether a prediction is reasonable given the data’s $x$ -range.

Quadratic regression: fitting a curve

If the scatterplot shows curvature, a quadratic model may fit better:

$y = ax^2 + bx + c$

$a$ controls how strongly the parabola curves and whether it opens up (a > 0) or down (a < 0).

Quadratic models appear when growth accelerates/decelerates, or when there’s a turning point.

How ACT tends to handle regression

ACT problems often avoid heavy computation and focus on interpretation:

Given $y = 2.3x + 5.1$ , interpret $2.3$ and $5.1$ in context.
Use the model to predict $y$ for a specific $x$ .
Compare two models and choose the better fit based on a scatterplot.

Worked examples

Example 1 (using a linear regression equation).
A trend line for hours studied $x$ and test score $y$ is:

$y = 4x + 60$

Interpretation:

Slope $4$ : each additional hour studied predicts about 4 more points.
Intercept $60$ : a student who studies 0 hours is predicted to score 60.

Prediction: for $x = 6$ hours:

$y = 4(6) + 60 = 84$

Example 2 (residual and model fit).
Using the same model, suppose a student studied $x = 6$ hours and actually scored $y = 78$ .

Predicted score $\hat{y} = 84$ , so residual:

$y - \hat{y} = 78 - 84 = -6$

A negative residual means the model overpredicted by 6 points.

Example 3 (when quadratic is more sensible).
A ball is thrown upward. Its height rises, then falls. Height vs time has a clear turning point, so a quadratic model is more appropriate than a line.

Exam Focus

Typical question patterns:
- Interpret slope and intercept in words, using context.
- Plug in an $x$ -value to predict $y$ and compare to an actual value (residual).
- Decide whether linear or quadratic is a better fit based on the scatterplot’s shape.
Common mistakes:
- Treating the intercept as meaningful when $x = 0$ is outside the data range (it may be a mathematical artifact).
- Extrapolating far beyond the observed data and trusting the result.
- Thinking “best fit” means the line passes through most points exactly (it usually won’t).

Calculating Probabilities and Sample Spaces

Probability is about quantifying uncertainty. On ACT Math, probability problems typically involve counting outcomes correctly and using the right denominator.

The basic probability rule

If all outcomes are equally likely:

$P(A) = \frac{\text{number of favorable outcomes}}{\text{total number of outcomes}}$

A good habit is to build the sample space (the set of all possible outcomes) before you calculate.

Complements: “at least one” problems

The complement of event $A$ is “not $A$ ,” written as $A^c$ . The key rule:

$P(A^c) = 1 - P(A)$

This is extremely useful when “at least one” is hard to count directly.

Addition rule (including overlap)

For events $A$ and $B$ :

$P(A \cup B) = P(A) + P(B) - P(A \cap B)$

$A \cup B$ means “A or B (or both).”
$A \cap B$ means “A and B.”

If $A$ and $B$ are **mutually exclusive** (cannot happen together), then $P(A \cap B) = 0$ and the formula simplifies.

Multiplication rule and independence

For events $A$ and $B$ :

$P(A \cap B) = P(A)P(B|A)$

If events are independent (one does not affect the other), then $P(B|A) = P(B)$ , so:

$P(A \cap B) = P(A)P(B)$

What goes wrong: Students often multiply probabilities when they see the word “and,” but “and” requires careful thinking: are the events independent? Are you sampling with or without replacement?

Conditional probability

Conditional probability means the probability of $B$ given $A$ happened:

$P(B|A) = \frac{P(A \cap B)}{P(A)}$

A common ACT setup is a table of counts (two-way table). The denominator for conditional probability is the total within the given condition.

Counting outcomes: fundamental counting principle

If one choice can be made in $m$ ways and another independent choice can be made in $n$ ways, then both choices together can be made in $mn$ ways.

This supports counting sample spaces.

Permutations and combinations

These appear when you’re choosing items.

Permutation (order matters):

$P(n,r) = \frac{n!}{(n-r)!}$

Combination (order does not matter):

$\binom{n}{r} = \frac{n!}{r!(n-r)!}$

$n!$ means factorial: $n! = n(n-1)(n-2)\cdots 2\cdot 1$

Quick decision tip: if arranging or assigning roles, order matters (permutation). If selecting a group where order doesn’t matter, use combinations.

Worked examples

Example 1 (equally likely outcomes).
A fair die is rolled. Probability of rolling a number greater than 4:

Favorable outcomes: {5,6} so 2 outcomes. Total outcomes: 6.

$P(\text{greater than 4}) = \frac{2}{6} = \frac{1}{3}$

Example 2 (conditional probability from counts).
In a class, 12 students play a sport, 18 do not. Of the 12 who play a sport, 9 also play an instrument.

Probability a randomly selected student plays an instrument given they play a sport:

The “given sport” group size is 12, and favorable (sport and instrument) is 9.

$P(\text{instrument} | \text{sport}) = \frac{9}{12} = \frac{3}{4}$

Example 3 (complement for “at least one”).
A bag has 5 red and 3 blue marbles. Two marbles are drawn without replacement. Find probability of at least one blue.

It’s easier to compute the complement: “no blue” means both are red.

$P(\text{both red}) = \frac{5}{8} \cdot \frac{4}{7} = \frac{20}{56} = \frac{5}{14}$

So:

$P(\text{at least one blue}) = 1 - \frac{5}{14} = \frac{9}{14}$

Example 4 (combinations).
How many ways to choose 3 students from 10 to form a committee?

Order doesn’t matter:

$\binom{10}{3} = \frac{10!}{3!7!} = \frac{10\cdot 9\cdot 8}{3\cdot 2\cdot 1} = 120$

Exam Focus

Typical question patterns:
- Build a sample space (often using counting) and compute a probability.
- Use complements for “at least one” or “not” events.
- Use a two-way table to compute conditional probabilities.
Common mistakes:
- Using the wrong denominator for conditional probability (it must be restricted to the “given” group).
- Forgetting “without replacement” changes probabilities on the second draw.
- Mixing up permutations and combinations (order matters vs not).

Normal Distributions

A normal distribution is a common model for real-world measurements influenced by many small effects (heights, measurement errors). It’s the classic “bell curve.” On the ACT, you’re typically asked to interpret a normal curve diagram, use the empirical rule, compare values using $z$ -scores, or reason about percentages.

Key features of the normal curve

A normal distribution is:

symmetric around its center
unimodal (one peak)
fully described by two parameters:
- the mean $\mu$ (center)
- the standard deviation $\sigma$ (spread)

Because it’s symmetric, the mean and median coincide.

Standardizing with z-scores

A z-score tells how many standard deviations a value $x$ is from the mean:

$z = \frac{x - \mu}{\sigma}$

If $z = 0$ , then $x = \mu$ .
If $z = 1$ , then $x$ is one standard deviation above the mean.
If $z = -2$ , then $x$ is two standard deviations below the mean.

Why it matters: z-scores let you compare values from different normal distributions. For example, a score of 85 might be impressive on a hard test but average on an easy test—z-scores capture that.

The empirical rule (68–95–99.7)

For a normal distribution:

About 68% of data fall within $\mu \pm \sigma$ .
About 95% fall within $\mu \pm 2\sigma$ .
About 99.7% fall within $\mu \pm 3\sigma$ .

Because the curve is symmetric, you can split these percentages across left and right halves.

For example, within $\mu \pm 2\sigma$ is about 95%, so outside that range is about 5% total, about 2.5% in each tail.

Using areas as probabilities

In a normal distribution model, “probability” corresponds to “area under the curve.” So questions like “What is the probability a randomly selected value is above $x$ ?” translate to “What fraction of the area is to the right of $x$ ?”

Often ACT problems avoid detailed z-tables and instead use:

the empirical rule
labeled diagrams
symmetry arguments

Worked examples

Example 1 (z-score).
Test scores are approximately normal with mean $\mu = 70$ and standard deviation $\sigma = 10$ . What is the z-score of $x = 85$ ?

$z = \frac{85 - 70}{10} = \frac{15}{10} = 1.5$

So 85 is 1.5 standard deviations above the mean.

Example 2 (empirical rule probability).
Heights are approximately normal with mean $\mu = 64$ inches and standard deviation $\sigma = 3$ inches. About what percent of heights are between 61 and 67 inches?

Notice 61 and 67 are one standard deviation below and above the mean:

$64 - 3 = 61$
$64 + 3 = 67$

So this is $\mu \pm \sigma$ , which is about 68%.

Example 3 (tail probability using empirical rule).
Using the same height distribution, about what percent are taller than 70 inches?

Compute how many standard deviations above the mean 70 is:

$z = \frac{70 - 64}{3} = 2$

So 70 is $\mu + 2\sigma$ . About 95% are within $\mu \pm 2\sigma$ , leaving 5% outside, split equally:

About 2.5% are above $\mu + 2\sigma$ .

Exam Focus

Typical question patterns:
- Compute and interpret a z-score (“how unusual is this value?”).
- Use the empirical rule to estimate percentages within 1, 2, or 3 standard deviations.
- Use symmetry to find “above” or “below” probabilities from a diagram.
Common mistakes:
- Using $\sigma$ when the problem gives variance (variance is $\sigma^2$ ; standard deviation is the square root).
- Forgetting tails split evenly in a symmetric normal distribution.
- Treating the empirical rule as exact rather than an approximation.