Unit 4: Probability, Random Variables, and Probability Distributions

Probability as a long-run model for randomness (Law of Large Numbers)

Probability in AP Statistics is not mainly about “guessing.” It’s a way to model random processes so you can predict how often outcomes happen in the long run. The key idea is that even when individual outcomes are uncertain, patterns across many repetitions can be predictable.

A random process is one where individual outcomes can’t be predicted with certainty, but there is still long-run regularity. For example, when flipping a fair coin repeatedly, you can’t predict the next flip, but over many flips the proportion of heads tends to settle near 0.5.

This long-run stability is formalized by the Law of Large Numbers: when an experiment is performed a large number of times, the relative frequency of an event tends to get closer to the true probability of the event. In other words, probability is long-run relative frequency. There is a sense of predictability in the long run.

The Law of Large Numbers relies on two conditions:

  • The chance process (and the probability of the event) does not change from trial to trial.
  • Conclusions are based on a large (very large) number of observations.

A helpful mental model is:

  • Short run: results can look “weird” (streaks happen).
  • Long run: proportions stabilize near the theoretical probability.

A common misconception is the gambler’s fallacy: believing that after a long streak, the opposite outcome is “due.” Random processes don’t have memory in that way. If a coin is fair, the probability of heads on the next flip is still 0.5 no matter what happened before.

Example 4.1 (Law of Large Numbers and strategy)

There are two games involving flipping a fair coin.

  • Game 1: You win a prize if the percent of heads is between 45% and 55%.
  • Game 2: You win if the percent of heads is more than 60%.

For each game, would you rather flip 20 times or 200 times?

Solution: The true probability of heads is 0.5. By the Law of Large Numbers, the more tosses you make, the closer the relative frequency tends to be to 0.5.

  • In Game 1, you want the relative frequency to be near 0.5, so you would rather have 200 flips.
  • In Game 2, you want an unusually high head rate (more than 60%), which is more plausible in a short run than a long run, so you would rather have only 20 flips.

Simulation as a probability tool

When a situation is too complicated for exact probability calculations (or when you want to check your reasoning), you can use a simulation. A simulation imitates a random process using a device like random digit tables, a random number generator, or physical randomizers.

The basic structure:

  1. Define what counts as an outcome.
  2. Assign random numbers/digits to represent outcomes with the correct probabilities.
  3. Run trials that mimic the real process.
  4. Repeat many times.
  5. Use the proportion of simulated outcomes as an estimate of probability.
Worked example: setting up a simulation

A basketball player makes a free throw with probability 0.7. Estimate the probability they make at least 2 of their next 3 free throws.

  • Model: each shot is a success with probability 0.7.
  • Assign digits 00–99:
    • 00–69 = make
    • 70–99 = miss
  • One trial (read three two-digit numbers): 12, 88, 03 gives make, miss, make, so 2 makes (counts as “at least 2”).

Repeat 100–200 trials; the fraction with at least 2 makes estimates the probability.

A common simulation mistake is assigning digits incorrectly so the simulated probability does not match the context (for example, using 00–70 for 0.7, which actually represents 0.71 if 70 is included).

Exam Focus
  • Typical question patterns:
    • “Describe how you would use a simulation to estimate…” (must specify assignments and what counts as success)
    • “Use the results of a simulation to estimate a probability” (compute a proportion)
    • “Explain why the simulation is appropriate / what assumptions it makes” (often independence and stable probability)
  • Common mistakes:
    • Forgetting to define a clear success condition for each trial
    • Assigning random digits incorrectly so the simulated probabilities don’t match the context
    • Running too few trials and treating the estimate as exact

Building probability models with events and sample spaces

To calculate probabilities, start with a clear description of possible outcomes.

Sample spaces and events

A sample space is the set of all possible outcomes. An event is a subset of outcomes.

Example: Roll a standard die.

  • Sample space: {1, 2, 3, 4, 5, 6}
  • Event A = “roll an even number” = {2, 4, 6}

Equally likely outcomes (and when that assumption fails)

If outcomes are equally likely, then:

P(A)=\frac{\text{number of outcomes in }A}{\text{number of outcomes in sample space}}

Many real situations are not equally likely, so you must rely on given probabilities, data, or an appropriate model.

Event relationships: complements, unions, intersections

  • Complement A^c: “A does not happen”

P(A^c)=1-P(A)

  • Union A \cup B: “A or B (or both)” (inclusive OR)
  • Intersection A \cap B: “A and B”

Mutually exclusive events

Events are mutually exclusive (disjoint) if they cannot both occur:

P(A \cap B)=0

Example: On one die roll, “roll a 2” and “roll a 5” are mutually exclusive, but “roll a 2” and “roll an even number” are not.

Exam Focus
  • Typical question patterns:
    • Translate words into symbols (union/intersection/complement)
    • Use Venn-diagram logic to compute probabilities
    • Decide whether events are mutually exclusive based on the context
  • Common mistakes:
    • Treating “or” as exclusive when it is inclusive
    • Confusing mutually exclusive with independent
    • Forgetting complements in “at least one” or “none” situations

Probability rules you actually use (and how to choose the right one)

Probability rules are the “grammar” of probability models. The key is matching the rule to the structure.

The addition rule (general)

For any events A and B:

P(A \cup B)=P(A)+P(B)-P(A \cap B)

If A and B are mutually exclusive, then P(A \cap B)=0 and:

P(A \cup B)=P(A)+P(B)

The multiplication rule (general)

“And” probabilities are often handled with conditional probability:

P(A \cap B)=P(A)P(B\mid A)

Equivalently:

P(A \cap B)=P(B)P(A\mid B)

Using complements strategically

Many “at least one” problems are easiest via the complement.

P(\text{at least one})=1-P(\text{none})

Worked example: addition rule in context

A school reports:

  • P(\text{sport})=0.40
  • P(\text{band})=0.25
  • P(\text{both})=0.10

Find P(\text{sport or band}).

P(A \cup B)=0.40+0.25-0.10=0.55

Worked example: complement with repeated trials

A machine produces a defective item with probability 0.02. Inspect 10 items; find the probability of at least one defective (assume independence).

P(\text{at least one defective})=1-P(\text{none defective})

P(\text{none defective})=0.98^{10}

P(\text{at least one defective})=1-0.98^{10}

Common mistake: computing 10 \times 0.02 and treating it as a probability; that is an expected count, not a probability.

Exam Focus
  • Typical question patterns:
    • “Find the probability of at least one…” (often complements)
    • “Given P(A), P(B), and P(A \cap B), find P(A \cup B)”
    • Two-step “and” events requiring multiplication with a conditional probability
  • Common mistakes:
    • Using the mutually exclusive addition rule when events can overlap
    • Multiplying repeated-trial probabilities without justifying independence
    • Confusing expected value (like np) with probability

Conditional probability, independence, and multistage probability

Conditional probability updates probabilities given new information. Independence describes when that new information doesn’t change anything.

Conditional probability

The conditional probability P(A\mid B) is the probability that A occurs given that B occurred:

P(A\mid B)=\frac{P(A \cap B)}{P(B)}

This requires P(B)>0.

Independence

Events A and B are independent if learning one occurred doesn’t change the probability of the other. Equivalent statements include:

P(A\mid B)=P(A)

and

P(A \cap B)=P(A)P(B)

Mutually exclusive events with positive probabilities cannot be independent because they have intersection probability 0.

Example 4.2 (Basic probability rules using a two-way table)

A standard literacy test has 100 multiple-choice questions, each with five possible answers. There is no penalty for guessing. A score of 60 is passing; 80 is superior. When answers are unknown, test takers use one of three strategies: guess, choose answer (c), or choose the longest answer. A two-way table summarizes results for 1000 test takers. Using that table:

  1. Probability a test taker uses the “guess” strategy:

P(\text{guess})=0.3

  1. Probability a test taker scores 60–79:

P(60\text{–}79)=0.53

  1. Probability a test taker does not score 60–79:

P(\text{not }60\text{–}79)=1-0.53=0.47

  1. Probability a test taker chooses strategy “answer (c)” and scores 80–100 (a joint probability): compute

P(\text{answer (c)} \cap 80\text{–}100)=\frac{\text{count in the (answer (c), 80–100) cell}}{1000}

  1. Probability a test taker chooses “longest answer” or scores 0–59:

P(\text{longest} \cup 0\text{–}59)=P(\text{longest})+P(0\text{–}59)-P(\text{longest} \cap 0\text{–}59)

(all quantities are found from the table by dividing counts by 1000)

  1. Probability a test taker chooses “guess” given the score was 0–59:

P(\text{guess}\mid 0\text{–}59)=0.333

  1. Probability a test taker scored 80–100 given they chose “longest answer”: compute

P(80\text{–}100\mid \text{longest})=\frac{\text{count in the (longest, 80–100) cell}}{\text{total count in the longest row}}

  1. Are “guess” and scoring 0–59 independent? Check whether

P(\text{guess}\mid 0\text{–}59)=P(\text{guess})

Since 0.333 \ne 0.3, they are not independent.

  1. Are “longest answer” and scoring 80–100 mutually exclusive? Since

P(\text{longest} \cap 80\text{–}100)=\frac{135}{1000}

which is not 0, they are not mutually exclusive.

A recurring skill here is choosing the correct denominator: conditional probabilities use the row/column total determined by the “given.”

Two-way tables (a reliable tool)

Two-way tables force clarity about totals and conditions.

Example: 200 students were surveyed about having a driver’s license and a part-time job.

Job (Yes)Job (No)Total
License (Yes)6040100
License (No)3070100
Total90110200

Let L = has a license, J = has a job.

  • P(L)=100/200=0.5
  • P(J)=90/200=0.45
  • P(L \cap J)=60/200=0.3
  • P(J\mid L)=60/100=0.6
  • P(L\mid J)=60/90

To check independence, test whether P(J\mid L)=P(J). Here 0.6 \ne 0.45, so not independent.

Tree diagrams and multistage probability

Tree diagrams help when events happen in stages; each branch is labeled with a conditional probability. Multiply along a path to get the probability of that full sequence, and add path probabilities to get totals.

Example 4.3 (Multistage probability using a tree)

On a university campus, 60%, 30%, and 10% of computers use Windows, Apple, and Linux, respectively. A new virus affects 3% of Windows, 2% of Apple, and 1% of Linux systems. The probability a randomly selected computer has the virus is:

P(\text{virus})=0.60(0.03)+0.30(0.02)+0.10(0.01)=0.025

Bayes’ rule (as a reasoning tool)

AP Statistics often expects you to “reverse a condition,” using a table, tree, or rates.

P(A\mid B)=\frac{P(B\mid A)P(A)}{P(B)}

Often it is easier and less error-prone to use counts in a table rather than memorizing the formula.

Worked example: Bayes with a real interpretation

A disease affects 1% of a population. A test has:

  • P(\text{positive}\mid \text{disease})=0.95
  • P(\text{positive}\mid \text{no disease})=0.02

Using a base of 10,000 people:

  • Disease: 100, with 95 positives
  • No disease: 9900, with 198 positives

Total positives: 293, so

P(\text{disease}\mid \text{positive})=\frac{95}{293}

Key lesson: rare conditions create many false positives even with a good test.

Exam Focus
  • Typical question patterns:
    • Compute P(A\mid B) from a two-way table (“given that,” “among those who…”)
    • Decide independence using conditional probabilities or P(A \cap B)=P(A)P(B)
    • Multistage probability with a tree diagram (add paths for totals)
    • Reverse a conditional probability (Bayes reasoning)
  • Common mistakes:
    • Using the grand total when a conditional probability is requested
    • Declaring independence because values “look close” rather than checking equality
    • Mixing up P(A\mid B) and P(B\mid A)

Discrete random variables, probability distributions, and cumulative distributions

A random variable turns outcomes into numbers so you can compute averages, variability, and compare situations quantitatively.

Random variables

A random variable is a numerical variable whose value comes from a random process. In this unit the focus is mostly on discrete random variables, which take a countable set of values.

Examples:

  • X = number of heads in 5 coin flips (0–5)
  • Y = number of customers arriving in the next hour (0, 1, 2, …)

Probability distribution (discrete)

A probability distribution is a list or formula that gives the probability of each outcome/value.

Rules:

  • Each probability satisfies 0 \le P(X=x) \le 1
  • Total probability sums to 1:

\sum P(X=x)=1

A common mistake is confusing values of X with probabilities; a correct table must clearly separate possible x values from P(X=x).

Worked example: checking whether a table is a valid distribution

A proposed distribution:

x0123
P(X=x)0.100.300.400.25

Total probability is:

0.10+0.30+0.40+0.25=1.05

This is not valid because probabilities must sum to 1.

Cumulative probability distribution

A cumulative probability distribution links outcomes with the probability of less than or equal to that outcome occurring. In notation, it focuses on values like P(X \le x).

Example 4.11 (Cumulative probability in context)

Using 2019 AP Statistics exam score probabilities, the probability a student did not receive college credit (assuming a 3 or higher earns credit) is the probability of scoring a 1 or 2:

P(\text{no credit})=0.211+0.197=0.408

Exam Focus
  • Typical question patterns:
    • “Is this a valid probability distribution? Justify.” (nonnegative; sum to 1)
    • Construct a probability distribution from a description or from counts
    • Interpret P(X \le k), P(X < k), P(X \ge k) from a table (watch inclusive vs exclusive)
    • Use cumulative probabilities to answer “at most,” “no more than,” and “less than or equal to” questions
  • Common mistakes:
    • Forgetting the total must be 1 (within rounding)
    • Using counts as probabilities without dividing by the total
    • Misreading “at most” (means \le) vs “less than” (means

Mean (expected value), variance, and standard deviation of a discrete random variable

Once you have a distribution, you can compute the “typical” value and how much it varies.

Expected value (mean)

The expected value is the long-run average:

\mu_X=E(X)=\sum xP(X=x)

The mean does not have to be a value the random variable can actually take (for example, expected heads in one flip is 0.5).

Variance and standard deviation

Variance is expected squared distance from the mean:

\sigma_X^2=\sum (x-\mu_X)^2P(X=x)

Standard deviation is:

\sigma_X=\sqrt{\sigma_X^2}

A useful shortcut is:

\sigma_X^2=E(X^2)-(E(X))^2

where

E(X^2)=\sum x^2P(X=x)

Worked example: compute mean and standard deviation

A game pays based on a spin:

x (dollars won)0520
P(X=x)0.60.30.1

Mean:

\mu_X=0(0.6)+5(0.3)+20(0.1)=3.5

Variance:

\sigma_X^2=(0-3.5)^2(0.6)+(5-3.5)^2(0.3)+(20-3.5)^2(0.1)=35.25

Standard deviation:

\sigma_X=\sqrt{35.25}

Example 4.4 (Lottery expected payoff)

A charity sells 10,000 tickets at 1 dollar each. One winner receives 7,500 dollars.

For a ticket holder, the payoff accounts for the cost of the ticket:

  • Win: payoff 7,499 with probability 1/10000
  • Lose: payoff −1 with probability 9999/10000

Expected payoff:

E(X)=7499\left(\frac{1}{10000}\right)+(-1)\left(\frac{9999}{10000}\right)=-0.25

Interpretation: the average result per ticket holder is a 0.25 dollar loss (equivalently, the charity’s expected gain is 0.25 dollars per ticket).

Example 4.5 (Mean, variance, standard deviation; “surprising”)

A highway engineer models miles of highway laid per day:

  • Clear: 5 miles with probability 0.6
  • Rain: 2 miles with probability 0.3
  • Snow: 1 mile with probability 0.1

Mean:

\mu=5(0.6)+2(0.3)+1(0.1)=3.7

Compute E(X^2):

E(X^2)=25(0.6)+4(0.3)+1(0.1)=16.3

Variance:

\sigma^2=16.3-(3.7)^2=2.61

Standard deviation:

\sigma=\sqrt{2.61}

Interpretation: in the long run, the crew averages 3.7 miles per day, and day-to-day mileage typically varies by about 1.62 miles from the mean.

Would it be surprising to lay 10 miles in one day? The z-score is:

z=\frac{10-3.7}{\sqrt{2.61}}

This is about 3.9 standard deviations above the mean, so it would be very surprising.

Expected value in decision-making (fair games)

Expected value is used to judge long-run fairness.

If it costs 4 dollars to play the earlier spin game with E(X)=3.5, then expected net gain is:

E(\text{net})=3.5-4=-0.5

A common misconception is “I could still win big, so it’s worth it.” Expected value is a long-run average, not a guarantee for one play.

Exam Focus
  • Typical question patterns:
    • Compute \mu_X, \sigma_X, and \sigma_X^2 from a distribution table
    • Interpret expected value in context (games, profit, cost)
    • Decide whether an outcome is “surprising” using standard deviations from the mean
    • Compare two random variables by center and spread
  • Common mistakes:
    • Forgetting to multiply by probabilities when computing E(X)
    • Treating \sum x^2P(X=x) as variance (it is E(X^2))
    • Interpreting expected value as a guaranteed outcome

Transforming and combining random variables (and sets)

Many real problems involve changing units or combining outcomes.

Linear transformations

For a transformation

Y=a+bX

the mean and standard deviation transform as:

\mu_Y=a+b\mu_X

\sigma_Y=|b|\sigma_X

Adding a constant shifts the mean but does not change standard deviation. Multiplying by a constant multiplies both mean and standard deviation by that constant.

Example: converting scores

If X has mean 70 and standard deviation 8:

  • If Y=X+5 then \mu_Y=75 and \sigma_Y=8.
  • If Y=1.10X then \mu_Y=77 and \sigma_Y=8.8.

Combining random variables: sums and differences

Means always add/subtract:

\mu_{X+Y}=\mu_X+\mu_Y

\mu_{X-Y}=\mu_X-\mu_Y

If X and Y are independent, variances add (for both sums and differences):

\sigma_{X+Y}^2=\sigma_X^2+\sigma_Y^2

\sigma_{X-Y}^2=\sigma_X^2+\sigma_Y^2

Then

\sigma_{X+Y}=\sqrt{\sigma_X^2+\sigma_Y^2}

Common mistake: adding standard deviations directly; you add variances.

Worked example: total cost (independent variables)

Let shipping cost X have mean 6 and standard deviation 2. Let packaging cost Y have mean 3 and standard deviation 1. Assume independence. Total cost T=X+Y.

\mu_T=6+3=9

\sigma_T^2=2^2+1^2=5

\sigma_T=\sqrt{5}

Example 4.6 (Sums of sets, then mean and variance)

Two people each draw one card from their own bags.

  • Bag 1 has X=\{1,9,20,74\}
  • Bag 2 has Y=\{5,15,55\}

All cards in a bag are equally likely, and the draws are independent.

List the set of possible sums:

W=\{6,16,56,14,24,64,25,35,75,79,89,129\}

Mean of the totals:

\mu_W=\mu_X+\mu_Y

Compute means:

\mu_X=\frac{1+9+20+74}{4}=26

\mu_Y=\frac{5+15+55}{3}=25

So

\mu_W=26+25=51

Variance relationship (independence):

\sigma_W^2=\sigma_X^2+\sigma_Y^2

Compute variances (equally likely values):

\sigma_X^2=\frac{(1-26)^2+(9-26)^2+(20-26)^2+(74-26)^2}{4}=813.5

\sigma_Y^2=\frac{(5-25)^2+(15-25)^2+(55-25)^2}{3}=466.6666667

So

\sigma_W^2=1280.1666667

and

\sigma_W=\sqrt{1280.1666667}

which is about 35.78. Interpretation: the pooled chips average 51, with a standard deviation about 35.78.

Example 4.7 (Sums of random variables)

An insurance salesperson models the number of new auto and home policies sold per day.

Mean calculations:

\mu_{\text{auto}}=(0)(0.2)+(1)(0.4)+(2)(0.3)+(3)(0.1)=1.3

\mu_{\text{home}}=(0)(0.5)+(1)(0.3)+(2)(0.2)=0.7

So

\mu_{\text{total}}=1.3+0.7=2.0

Variance calculations:

\sigma_{\text{auto}}^2=(0-1.3)^2(0.2)+(1-1.3)^2(0.4)+(2-1.3)^2(0.3)+(3-1.3)^2(0.1)=0.81

\sigma_{\text{home}}^2=(0-0.7)^2(0.5)+(1-0.7)^2(0.3)+(2-0.7)^2(0.2)=0.61

Assuming independence:

\sigma_{\text{total}}^2=0.81+0.61=1.42

So

\sigma_{\text{total}}=\sqrt{1.42}

This independence assumption may not be realistic if some customers buy both.

Example 4.8 (Transforming payoffs)

A carnival game has payoffs:

  • 2 dollars with probability 0.5
  • 5 dollars with probability 0.4
  • 10 dollars with probability 0.1

Mean:

\mu=2(0.5)+5(0.4)+10(0.1)=4

So you should be willing to pay any amount less than or equal to 4 dollars to play (based on expected winnings).

Compute standard deviation using E(X^2)-(E(X))^2:

E(X^2)=4(0.5)+25(0.4)+100(0.1)=22

\sigma^2=22-16=6

\sigma=\sqrt{6}

If 4 is added to each payoff, the mean increases by 4 but the standard deviation stays the same:

\mu_{X+4}=\mu_X+4

\sigma_{X+4}=\sigma_X

If each payoff is tripled, both mean and standard deviation are tripled:

\mu_{3X}=3\mu_X

\sigma_{3X}=3\sigma_X

Exam Focus
  • Typical question patterns:
    • Apply Y=a+bX to update mean and standard deviation
    • Combine two random variables (total, difference) and compute mean and standard deviation under independence
    • Decide whether independence is reasonable in context
  • Common mistakes:
    • Adding standard deviations instead of variances
    • Adding a constant to the standard deviation (shifts do not change spread)
    • Forgetting the absolute value in \sigma_Y=|b|\sigma_X

The binomial distribution: counting successes in a fixed number of trials

A binomial model fits repeated two-outcome trials when you count successes in a fixed number of trials.

When a binomial model applies (BINS)

A random variable X is binomial when:

  1. Binary outcomes (success/failure)
  2. Independent trials
  3. Fixed Number of trials n
  4. Same probability of success p each trial

A common mnemonic is BINS.

Notation and probability formula

X \sim Bin(n,p)

Probability of exactly k successes:

P(X=k)=\binom{n}{k}p^k(1-p)^{n-k}

Mean and standard deviation

\mu=np

\sigma=\sqrt{np(1-p)}

Worked example: binomial probability (guessing on a quiz)

A 10-question multiple-choice quiz has 4 choices per question; a student guesses on all questions. Let X be the number correct.

This is binomial with n=10 and p=0.25:

X \sim Bin(10,0.25)

Probability of exactly 3 correct:

P(X=3)=\binom{10}{3}(0.25)^3(0.75)^7

“At least” and “at most”

For example:

P(X \ge 3)=1-[P(X=0)+P(X=1)+P(X=2)]

Common mistake: treating “at least 3” as P(X=3).

Binomial with sampling: the 10% condition

Sampling without replacement is not truly independent, but AP Statistics often allows a binomial model when the sample size is no more than 10% of the population size.

Example 4.9 (Binomial probabilities with defective lightbulbs)

Suppose the probability a lightbulb is defective is 0.1 (so good is 0.9).

1) Probability four bulbs are all defective:

P=(0.1)^4=0.0001

2) Probability exactly two out of three are defective:

P=3(0.1)^2(0.9)=0.027

3) Probability exactly three out of eight are defective:

P=\binom{8}{3}(0.1)^3(0.9)^5=56\times 0.00059049=0.03306744

Technology note: on a TI-84,

\text{binompdf}(8,0.1,3)=0.03306744

Exam Focus
  • Typical question patterns:
    • Identify whether a situation is binomial and justify using BINS
    • Compute P(X=k) or cumulative probabilities like P(X \ge k)
    • Use \mu=np and \sigma=\sqrt{np(1-p)} to describe center and spread
  • Common mistakes:
    • Using binomial when p changes across trials
    • Forgetting the combination term \binom{n}{k}
    • Confusing P(X \ge k) with P(X=k)

The geometric distribution: waiting time until the first success

A geometric model fits repeated independent trials with constant success probability when you count how many trials occur until the first success.

When a geometric model applies

Geometric conditions:

  • two outcomes (success/failure)
  • independent trials
  • constant probability of success p
  • X counts the number of trials until the first success (including the success trial)

Key difference from binomial:

  • Binomial: fixed n, count successes
  • Geometric: success count fixed at 1 (the first success), count trials

Probability formula

For k=1,2,3,\dots:

P(X=k)=(1-p)^{k-1}p

Mean and standard deviation

\mu=\frac{1}{p}

\sigma=\sqrt{\frac{1-p}{p^2}}

Memoryless property

Geometric distributions are memoryless: after many failures, the chance of success on the next trial is still p. This connects directly to avoiding gambler’s fallacy thinking.

Worked example: geometric probability (call center)

A call center resolves an issue on any attempt with probability 0.2, independently. Let X be the number of attempts until the first success.

Probability the first success is on attempt 4:

P(X=4)=(0.8)^3(0.2)

Probability of success within the first 3 attempts:

P(X \le 3)=1-(0.8)^3

Common mistake: using exponent k instead of k-1; the exponent counts failures.

Example 4.10 (Geometric with p = 0.12)

Suppose only 12% of men in ancient Greece were honest. Let X be the number of men Diogenes must meet to encounter the first honest man. This is geometric with p=0.12.

First honest man is the third man met:

P(X=3)=(0.88)^2(0.12)=0.092928

First honest man occurs no later than the fourth man met:

P(X \le 4)=0.40030464

You can compute this by summing the first four geometric probabilities or by using a geometric CDF.

Technology note: on a TI-84,

\text{geometpdf}(0.12,3)=0.092928

\text{geometcdf}(0.12,4)=0.40030464

To receive full credit for geometric distribution probability calculations, students should state:

  1. Name of the distribution (geometric)
  2. Parameter (for example, p=0.12)
  3. The trial on which the first success occurs (for example, X=3)
  4. The correct probability value (for example, 0.092928)
Exam Focus
  • Typical question patterns:
    • Identify a scenario as geometric (waiting time) and justify the conditions
    • Compute P(X=k) or P(X \le k) using sums or complements
    • Interpret \mu=1/p as expected waiting time
  • Common mistakes:
    • Off-by-one errors (geometric counts start at 1)
    • Treating “within k trials” as P(X=k) instead of P(X \le k)
    • Using geometric when p changes across trials

Connecting the models: choosing tools and interpreting results

A major goal of Unit 4 is not just calculation but model choice and interpretation. On free-response questions, clear justification and context-based interpretation often earn as much credit as the computation.

Choosing the right tool

Start by identifying structure:

  • One-time events with “and/or/given that”: use probability rules, tables, trees, conditional probability.
  • Repeated independent trials with fixed n counting successes: binomial.
  • Repeated independent trials waiting for first success: geometric.
  • Messy situations or checks: simulation.

Before choosing a named distribution, write down:

  • What are the trials?
  • What counts as success?
  • Is p constant?
  • Is independence reasonable?
  • What does the random variable count?

Interpreting probability statements as long-run frequencies

A strong interpretation treats probability as long-run behavior. If you find P(X \ge 2)=0.31 in a binomial setting, a strong interpretation is: in the long run, about 31% of repeated groups of n trials will result in at least 2 successes.

A note on calculator use

Using calculator binomial/geometric commands is fine, but you still must identify n and p correctly, state what probability you computed (like P(X \le 4)), and show enough setup to make your work auditable. A common calculator-driven error is mixing up strict vs inclusive inequalities (for example, P(X