AP Statistics Unit 2 Study Guide: Exploring Two-Variable Data (Tables, Scatterplots, Correlation, Regression, and Causation)

0.0(0)

Studied by 0 people

View linked note

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/49

There's no tags or description

Looks like no tags are added yet.

Last updated 9:37 PM on 3/9/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

50 Terms

New cards

One-variable data

Data consisting of measurements on a single variable, analyzed by describing its distribution (center, spread, shape).

New cards

Two-variable data

Data with two variables measured on each individual, used to study whether and how the variables change together.

New cards

Association

A relationship between two variables where knowing the value of one provides information about the likely value of the other.

New cards

Categorical variable

A variable that places individuals into categories or groups (e.g., blood type, political party).

New cards

Quantitative variable

A variable that records numerical values for which arithmetic operations make sense (e.g., height, time).

New cards

Explanatory variable (x)

The variable used to explain or predict changes in another variable; often called the predictor.

New cards

Response variable (y)

The outcome variable you want to predict or understand; it responds to changes in the explanatory variable.

New cards

Paired data

Two-variable data where each individual contributes a matched pair of values (x, y); the pairing must be kept intact.

New cards

Two-way table

A table that displays counts for combinations of two categorical variables.

New cards

Contingency table

Another name for a two-way table of counts for two categorical variables.

New cards

Joint frequency

An interior cell count in a two-way table for a specific row-and-column category combination.

New cards

Marginal frequency

A row total or column total in a two-way table (a total for one variable, ignoring the other).

New cards

Table total (n)

The grand total of all counts in a two-way table; the overall sample size.

New cards

Marginal distribution

The distribution (often as proportions/percentages) of one categorical variable, ignoring the other variable.

New cards

Conditional relative frequency

A proportion computed within a given row or column (within a condition) to compare groups for association.

New cards

Segmented bar chart

A display for two categorical variables where each bar represents a group and segments show conditional relative frequencies (each bar totals 100%).

New cards

Mosaic plot

A display for two categorical variables that shows conditional proportions and can also reflect different group sizes via widths/areas.

New cards

Simpson’s paradox

A situation where an overall association changes direction or disappears when data are split into meaningful subgroups due to a lurking variable.

New cards

Bivariate quantitative data

A dataset containing two quantitative variables measured on the same individuals.

New cards

Scatterplot

A graph of paired quantitative data where each individual is plotted as a point (x, y) to visualize a relationship.

New cards

Direction (association)

Whether y tends to increase as x increases (positive), decrease as x increases (negative), or show no clear trend.

New cards

Form (scatterplot)

The overall shape of a relationship in a scatterplot (e.g., linear, curved).

New cards

Strength (scatterplot)

How closely points cluster around the form (e.g., weak, moderate, strong).

New cards

Outlier

An unusual point that does not fit the overall pattern; it can strongly affect correlation and regression.

New cards

Correlation (r)

A numerical measure of the direction and strength of a linear relationship between two quantitative variables.

New cards

Correlation properties (linear, unitless, not resistant)

Key facts about r: it measures only linear association, has no units, ranges from -1 to 1, and can be greatly affected by outliers.

New cards

Standardized value (z-score)

A value expressed in standard deviation units: z = (value − mean) / standard deviation.

New cards

Coefficient of determination (r^2)

For linear regression, the proportion of variation in y accounted for by the regression of y on x (between 0 and 1).

New cards

Least-squares regression line

The line that minimizes the sum of squared vertical residuals: Σ(y − ŷ)^2.

New cards

Regression equation (ŷ = a + bx)

The equation of a regression line giving predicted response ŷ from explanatory value x, with intercept a and slope b.

New cards

Predicted value (ŷ)

The model’s predicted value of the response variable y for a given x.

New cards

Slope (b)

In ŷ = a + bx, the predicted change in y for a 1-unit increase in x (with units of y per unit of x).

New cards

Intercept (a)

In ŷ = a + bx, the predicted value of y when x = 0; may or may not be meaningful in context.

New cards

Extrapolation

Using a regression model to predict outside the observed range of x; risky because the pattern may not continue.

New cards

Residual

The difference between an observed and predicted value: residual = y − ŷ (the vertical distance from a point to the regression line).

New cards

Residual sign (positive vs negative)

Positive residual: model underestimated (point above line). Negative residual: model overestimated (point below line).

New cards

Residual plot

A graph of residuals versus x (or versus ŷ) used to check whether a linear model is appropriate.

New cards

Curvature (residual plot pattern)

A systematic curved pattern in a residual plot, suggesting the relationship is not linear and a different model may be needed.

New cards

Changing spread (fan shape)

A residual-plot pattern where variability increases or decreases with x, suggesting non-constant variability.

New cards

Standard deviation of residuals (s)

A typical prediction error size in y-units: s = sqrt( Σ(y − ŷ)^2 / (n − 2) ).

New cards

Regression outlier

A point with an unusually large residual (far above or below the regression line compared to other points).

New cards

Influential point

A point whose removal would noticeably change the regression line (slope/intercept) and possibly r and r^2.

New cards

Leverage

A measure of how far an x-value is from the mean of x; high-leverage points have strong potential to affect the regression line.

New cards

Transformation (to achieve linearity)

Changing variables (often using log/ln or power transforms) to make a curved relationship more nearly linear and improve a linear model.

New cards

Lurking variable

A third variable that affects both variables being studied and may explain or distort an apparent association.

New cards

Confounding

When the effects of two variables on a response cannot be separated (common in observational comparisons).

New cards

Observational study

A study where researchers observe and record data without assigning treatments; supports association but not causation.

New cards

Controlled experiment

A study where researchers impose treatments (often with random assignment); can support cause-and-effect conclusions if well designed.

New cards

Random assignment

Randomly assigning individuals to treatments; supports a cause-and-effect conclusion (for the participants).

New cards

Random sampling

Randomly selecting individuals from a population; supports generalizing results to the population.