AP Statistics Unit 4 Notes: Random Variables, Expected Value, and Combining Outcomes

Discrete Random Variables and Probability Distributions

What a random variable is (and why we bother)

A random variable is a rule that assigns a numerical value to each outcome of a chance process. In AP Statistics, you usually name a random variable with a capital letter like XX, and its possible numerical values with lowercase like xx.

This matters because probability questions often start with messy outcomes (sequences of coin flips, categories of survey responses, etc.). By translating outcomes into numbers, you can summarize uncertainty with tools like a probability distribution, and later compute center and spread (mean and standard deviation) the same way you do with data—except now you’re describing a theoretical long-run pattern, not a sample.

A discrete random variable is one that takes on a countable set of values (often integers): 0, 1, 2, 3, … For example:

  • XX = number of heads in 5 coin flips (possible values 0 through 5)
  • YY = number of customers arriving in the next hour (0, 1, 2, …)

A common misconception is to think the random variable is the outcome (like “HHTHT”). It’s not—the random variable is the number you compute from the outcome (like “3 heads”).

Probability distribution for a discrete random variable

A probability distribution of a discrete random variable lists every possible value of the variable and the probability that it occurs.

If XX takes values x1,x2,,xkx_1, x_2, \dots, x_k, then its distribution gives P(X=xi)P(X = x_i) for each value.

A valid discrete probability distribution must satisfy:

  1. Every probability is between 0 and 1:
    0P(X=xi)10 \le P(X = x_i) \le 1
  2. The probabilities add to 1:
    i=1kP(X=xi)=1\sum_{i=1}^k P(X = x_i) = 1

You’ll most often see distributions shown as a table or as a probability histogram (bars at each value with heights equal to the probabilities). A probability histogram can look like a data histogram, but conceptually it is different: it represents a model (long-run relative frequencies), not a dataset.

Notation you’ll see (quick reference)
IdeaCommon notationMeaning
Random variableXXThe process-defined numerical outcome
A specific valuexxOne possible number the variable can take
ProbabilityP(X=x)P(X = x)Chance that XX equals that value
Mean / expected valueμX\mu_X or E(X)E(X)Long-run average of XX
Standard deviationσX\sigma_XLong-run typical distance of XX from its mean
VarianceσX2\sigma_X^2 or Var(X)Var(X)Square of the standard deviation
Building a distribution from a chance process

To create a probability distribution, you typically:

  1. Define the random variable clearly in context (what is being counted or measured?).
  2. List all possible values it can take.
  3. Find the probability for each value (often using counting, binomial probability, geometric probability, or a provided model).
  4. Check that probabilities sum to 1.

A common error is to skip step 1 (definition) and then misinterpret what a value like X=2X = 2 means. On AP questions, you should be able to say something like: “Let XX be the number of defective bulbs in a sample of 10.” Then X=2X = 2 has a clear meaning.

Example 1: Creating a probability distribution (3 coin flips)

Suppose you flip a fair coin 3 times. Let XX be the number of heads.

Step 1: Possible values
X{0,1,2,3}X \in \{0, 1, 2, 3\}

Step 2: Probabilities
There are 23=82^3 = 8 equally likely outcomes. Count outcomes with each number of heads:

  • X=0X = 0: 1 outcome (TTT) so P(X=0)=1/8P(X=0)=1/8
  • X=1X = 1: 3 outcomes so P(X=1)=3/8P(X=1)=3/8
  • X=2X = 2: 3 outcomes so P(X=2)=3/8P(X=2)=3/8
  • X=3X = 3: 1 outcome so P(X=3)=1/8P(X=3)=1/8

Distribution table

xx0123
P(X=x)P(X=x)1/83/83/81/8

Check: probabilities add to 1/8+3/8+3/8+1/8=8/8=11/8 + 3/8 + 3/8 + 1/8 = 8/8 = 1.

Cumulative probability (sometimes asked)

Sometimes you’re asked for probabilities like P(X2)P(X \le 2) rather than P(X=2)P(X = 2). For discrete variables, you usually compute these by adding the relevant probabilities.

The cumulative distribution function (CDF) is defined by:
F(x)=P(Xx)F(x) = P(X \le x)

For the coin example:
P(X2)=P(X=0)+P(X=1)+P(X=2)=1/8+3/8+3/8=7/8P(X \le 2) = P(X=0)+P(X=1)+P(X=2) = 1/8 + 3/8 + 3/8 = 7/8

A common mistake is to treat P(X2)P(X \le 2) as if it were P(X=2)P(X=2). The symbol matters.

Exam Focus
  • Typical question patterns:
    • “Define a random variable for this situation and give its probability distribution.”
    • “Is this table a valid probability distribution? Justify.”
    • “Find P(Xa)P(X \le a) or P(aXb)P(a \le X \le b) from a distribution.”
  • Common mistakes:
    • Forgetting to check that probabilities sum to 1 (or failing to notice a missing probability).
    • Mixing up outcomes with random variable values (writing probabilities for sequences instead of counts).
    • Misreading symbols like P(X<2)P(X < 2) vs P(X2)P(X \le 2) when the variable is discrete.

Mean and Standard Deviation of Random Variables

The big idea: long-run center and long-run variability

When you compute the mean of a dataset, you’re summarizing the average of observed values. For a random variable, you’re summarizing what would happen in the long run if you repeated the chance process many times.

The mean of a random variable is also called its expected value. The word “expected” does not mean “guaranteed” or even “most likely.” It means the long-run average value.

For example, if a game has expected winnings of 0.500.50 dollars, you should not expect to win exactly 0.500.50 each time. You might win 55 sometimes and lose 11 other times—but over many plays, the average tends toward 0.500.50.

Expected value (mean) for a discrete random variable

If XX takes values x1,x2,,xkx_1, x_2, \dots, x_k with probabilities p1,p2,,pkp_1, p_2, \dots, p_k, then the mean (expected value) is the probability-weighted average:

μX=E(X)=i=1kxipi\mu_X = E(X) = \sum_{i=1}^k x_i p_i

Interpretation: μX\mu_X is what you’d get if you averaged an enormous number of repetitions of the random process.

A very common student error is to average the possible values without weighting by probability. If some outcomes are more likely than others, they must count more.

Variance and standard deviation: measuring spread of a random variable

The standard deviation of a random variable describes the typical distance between XX and its mean μX\mu_X in the long run.

For a discrete random variable, the variance is:

σX2=i=1k(xiμX)2pi\sigma_X^2 = \sum_{i=1}^k (x_i - \mu_X)^2 p_i

and the standard deviation is:

σX=σX2\sigma_X = \sqrt{\sigma_X^2}

This mirrors what you do with data: deviations from the mean, squared, averaged (with probabilities as weights), then square-rooted.

There is also a computational shortcut that can be helpful:

σX2=E(X2)(E(X))2\sigma_X^2 = E(X^2) - (E(X))^2

where

E(X2)=i=1kxi2piE(X^2) = \sum_{i=1}^k x_i^2 p_i

This shortcut is useful when the distribution values are messy, but you must be careful with parentheses: square the expectation, not the other way around.

Example 2: Mean and standard deviation from a distribution

A company offers a coupon that results in the following discount (in dollars). Let XX be the discount a randomly selected customer receives.

xx0510
P(X=x)P(X=x)0.500.400.10

Step 1: Compute the mean

E(X)=0(0.50)+5(0.40)+10(0.10)E(X) = 0(0.50) + 5(0.40) + 10(0.10)

E(X)=0+2+1=3E(X) = 0 + 2 + 1 = 3

So μX=3\mu_X = 3 dollars. In the long run, the company gives an average discount of 33 dollars per customer.

Step 2: Compute the variance and standard deviation
Use the definition:

σX2=(03)2(0.50)+(53)2(0.40)+(103)2(0.10)\sigma_X^2 = (0-3)^2(0.50) + (5-3)^2(0.40) + (10-3)^2(0.10)

σX2=9(0.50)+4(0.40)+49(0.10)\sigma_X^2 = 9(0.50) + 4(0.40) + 49(0.10)

σX2=4.5+1.6+4.9=11.0\sigma_X^2 = 4.5 + 1.6 + 4.9 = 11.0

σX=11\sigma_X = \sqrt{11}

If you approximate, 11\sqrt{11} is about 3.323.32 dollars, meaning the discount typically differs from the mean by a bit over 33 dollars.

Example 3: Expected value as “fair price” (a classic AP framing)

A carnival game costs cc dollars to play. You spin a wheel:

  • With probability 0.2 you win 1010 dollars.
  • With probability 0.8 you win 00 dollars.

Let WW be your winnings in dollars (not profit).

Expected winnings

E(W)=10(0.2)+0(0.8)=2E(W) = 10(0.2) + 0(0.8) = 2

On average, you win 22 dollars per play.

If you care about profit, define P=WcP = W - c. Then:

E(P)=E(Wc)=E(W)c=2cE(P) = E(W - c) = E(W) - c = 2 - c

A “fair” price (expected profit 00) would solve 2c=02 - c = 0, so c=2c = 2. If the game costs more than 22 dollars, you should expect to lose money in the long run.

Common misconception: students sometimes think a fair game means you win about half the time. Not necessarily—fairness is about expected value, not win rate.

Linear transformations and how they affect mean and standard deviation

In many problems, you define a new variable by transforming an old one, such as converting units, adding a fixed fee, or scaling a reward.

If

Y=a+bXY = a + bX

then:

μY=a+bμX\mu_Y = a + b\mu_X

σY=bσX\sigma_Y = |b|\sigma_X

Why this makes sense:

  • Adding aa shifts every outcome by the same amount, so the center shifts by aa but the spread does not change.
  • Multiplying by bb stretches or shrinks distances from the mean by a factor of b|b|, so the standard deviation scales by b|b|.

A frequent mistake is to add aa to the standard deviation. You do not—standard deviation measures spread, and adding a constant doesn’t change how spread out values are.

Exam Focus
  • Typical question patterns:
    • “Given this distribution, find μX\mu_X and σX\sigma_X and interpret them in context.”
    • “A new variable is defined by Y=a+bXY = a + bX. Find μY\mu_Y and σY\sigma_Y.”
    • “Find the expected value of winnings/profit; determine whether a game is fair.”
  • Common mistakes:
    • Computing E(X)E(X) by averaging the xx values without using probabilities as weights.
    • Forgetting to square-root at the end when asked for standard deviation (reporting variance instead).
    • Incorrect transformation rules, especially adding a constant to standard deviation or forgetting the absolute value on bb.

Combining Random Variables

Why combining random variables is such a powerful move

Many real situations are built from parts:

  • Total cost = item cost + shipping
  • Total points = points from multiple questions
  • Total wait time = wait time for bus + travel time

Each piece can be modeled as a random variable. Combining them lets you predict the long-run behavior of a total (mean) and how much that total varies (standard deviation).

The key skill is knowing which results always hold and which require independence.

Adding and subtracting random variables: expected value

If XX and YY are random variables, then expected values add exactly the way you wish they would:

E(X+Y)=E(X)+E(Y)E(X + Y) = E(X) + E(Y)

E(XY)=E(X)E(Y)E(X - Y) = E(X) - E(Y)

This does not require independence.

Interpretation: in the long run, the average total is the sum of the long-run averages.

Adding and subtracting random variables: variability (standard deviation)

Spread is trickier. If XX and YY are independent, then their variances add:

Var(X+Y)=Var(X)+Var(Y)Var(X + Y) = Var(X) + Var(Y)

and since Var(X)=σX2Var(X) = \sigma_X^2, you can write:

σX+Y=σX2+σY2\sigma_{X+Y} = \sqrt{\sigma_X^2 + \sigma_Y^2}

Similarly, for independent variables:

Var(XY)=Var(X)+Var(Y)Var(X - Y) = Var(X) + Var(Y)

So:

σXY=σX2+σY2\sigma_{X-Y} = \sqrt{\sigma_X^2 + \sigma_Y^2}

Two big “what can go wrong” warnings:

  1. Standard deviations do not add. Even when independent, you add variances, not standard deviations.
  2. If XX and YY are not independent, you generally cannot use these variance rules without additional information.

On the AP exam, if you’re supposed to add variances, the problem will typically indicate independence (or give a context that strongly implies it, like results of separate trials).

Linear combinations (the most general AP-level rule set)

A linear combination looks like:

T=a+bX+cYT = a + bX + cY

For means, constants and coefficients behave normally:

μT=a+bμX+cμY\mu_T = a + b\mu_X + c\mu_Y

For standard deviations, you typically handle one coefficient at a time using variance. If XX and YY are independent:

Var(T)=b2Var(X)+c2Var(Y)Var(T) = b^2 Var(X) + c^2 Var(Y)

So:

σT=b2σX2+c2σY2\sigma_T = \sqrt{b^2 \sigma_X^2 + c^2 \sigma_Y^2}

Again, independence is what lets you avoid extra covariance terms.

Example 4: Total cost with a fixed fee (combining and transforming)

Let XX be the amount (in dollars) a customer spends on items. Suppose μX=45\mu_X = 45 and σX=12\sigma_X = 12. Shipping is a flat 77 dollars. Let TT be the total cost.

Model:

T=X+7T = X + 7

Mean:

μT=μX+7=45+7=52\mu_T = \mu_X + 7 = 45 + 7 = 52

Standard deviation (adding a constant does not change spread):

σT=σX=12\sigma_T = \sigma_X = 12

Interpretation: average total cost is 5252 dollars, and totals typically vary by about 1212 dollars from that average.

Example 5: Sum of independent scores

A student’s total score SS is the sum of two independent section scores: AA and BB.

  • μA=30\mu_A = 30, σA=4\sigma_A = 4
  • μB=50\mu_B = 50, σB=6\sigma_B = 6

Let

S=A+BS = A + B

Mean:

μS=μA+μB=30+50=80\mu_S = \mu_A + \mu_B = 30 + 50 = 80

Standard deviation (independent, so add variances):

σS=σA2+σB2=42+62=16+36=52\sigma_S = \sqrt{\sigma_A^2 + \sigma_B^2} = \sqrt{4^2 + 6^2} = \sqrt{16 + 36} = \sqrt{52}

So σS\sigma_S is about 7.217.21.

A very common mistake is to compute σS=4+6=10\sigma_S = 4 + 6 = 10. That would overstate the spread because it treats typical deviations as if they always point in the same direction at the same time.

Example 6: Difference of independent variables (net gain)

A store’s daily net gain NN (in dollars) is revenue minus cost:

N=RCN = R - C

Suppose across days:

  • μR=1200\mu_R = 1200, σR=250\sigma_R = 250
  • μC=800\mu_C = 800, σC=180\sigma_C = 180
  • Assume RR and CC are independent.

Mean:

μN=μRμC=1200800=400\mu_N = \mu_R - \mu_C = 1200 - 800 = 400

Standard deviation:

σN=σR2+σC2=2502+1802\sigma_N = \sqrt{\sigma_R^2 + \sigma_C^2} = \sqrt{250^2 + 180^2}

σN=62500+32400=94900\sigma_N = \sqrt{62500 + 32400} = \sqrt{94900}

So σN\sigma_N is about 308.1308.1.

Notice that the subtraction affects the mean (center) but not the way variances combine when independent.

Independence: what it means here (and how to spot it)

Two random variables XX and YY are independent if knowing the value of one gives no information about the other. In many AP settings, independence comes from:

  • Separate trials (coin flips, spins, draws with replacement)
  • Measurements on different individuals chosen independently

But be cautious: variables computed from the same trial are often dependent. For example, in a single hand of cards, “number of hearts” and “number of red cards” are related—knowing one changes what you expect about the other.

If a problem does not indicate independence and you’re asked for the standard deviation of a sum/difference, that’s a clue you may need more information (or the problem is structured so independence is reasonable from context).

Exam Focus
  • Typical question patterns:
    • “Let T=X+YT = X + Y (or XYX - Y). Find μT\mu_T and σT\sigma_T given means and standard deviations; assume independence.”
    • “A quantity is transformed (fee, tax, unit conversion). Find the new mean and standard deviation.”
    • “Compare variability of a total to variability of components; interpret what changes center vs spread.”
  • Common mistakes:
    • Adding standard deviations directly instead of adding variances.
    • Using the variance-addition rule without independence (or without it being justified).
    • Treating subtraction as if it subtracts variability (it doesn’t, under independence variance still adds).