Unit 2: Exploring Two-Variable Data

0.0(0)

Studied by 0 people

0%Unit 2 Mastery

0%Exam Mastery

View linked note

Build your Mastery score

AP Practice

Supplemental Materials

Call Kai

Card Sorting

1/49

Earn XP

Description and Tags

AP Statistics

Unit 2: Exploring Two-Variable Data

Last updated 2:11 AM on 3/12/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

50 Terms

New cards

Two-variable (bivariate) data set

A data set that records two pieces of information (two variables) for each individual (person, object, or event) to study a relationship.

New cards

Categorical variable

A variable that places individuals into groups or categories (e.g., brand, gender, yes/no, region).

New cards

Quantitative variable

A numerical variable for which arithmetic operations are meaningful (e.g., height, time, income).

New cards

Categorical vs categorical analysis

Studying the relationship between two categorical variables, typically using two-way tables and conditional distributions.

New cards

Quantitative vs quantitative analysis

Studying the relationship between two quantitative variables, typically using scatterplots, correlation, and linear regression.

New cards

Categorical vs quantitative analysis

Comparing a quantitative variable across categories of a categorical variable, often using side-by-side boxplots or dotplots by group.

New cards

Explanatory variable

The variable used to help explain or predict another variable; often labeled x in regression and placed on the horizontal axis.

New cards

Response variable

The outcome variable being predicted or explained; often labeled y in regression and placed on the vertical axis.

New cards

Association

A pattern or relationship between two variables (without automatically implying that one causes the other).

New cards

Causation

A cause-and-effect relationship where changes in one variable produce changes in another; not guaranteed by association alone.

New cards

Lurking variable

A variable not included in the analysis that may help explain the relationship observed between two variables.

New cards

Confounding

When the effects of two variables are mixed together so their individual effects on a response cannot be separated.

New cards

Two-way table

A table of counts for combinations of categories from two categorical variables, used to examine possible relationships.

New cards

Contingency table

Another name for a two-way table organizing counts for two categorical variables.

New cards

Row variable

In a two-way table, the categorical variable whose categories label the rows; affects how conditional distributions are computed.

New cards

Column variable

In a two-way table, the categorical variable whose categories label the columns; affects how conditional distributions are computed.

New cards

Table total (grand total)

The sum of all cell counts in a two-way table.

New cards

Joint relative frequency

A proportion for a specific cell in a two-way table: (cell count) ÷ (table total).

New cards

Marginal frequency

A row total or column total in the margins of a two-way table (the “totals” for one variable).

New cards

Marginal distribution

The distribution of one variable alone from a two-way table, found by converting marginal totals to proportions/percentages.

New cards

Conditional distribution

The distribution of one variable restricted to a specific category of the other variable (i.e., “given that…”).

New cards

Conditional relative frequency

A proportion computed within a subgroup (row or column), used to compare groups and judge association.

New cards

Difference in proportions

A numerical comparison of two conditional proportions (often used to describe the size of an association for categorical variables).

New cards

Segmented bar chart (100% stacked bar chart)

A graph that compares conditional distributions by using bars scaled to 100% so segment lengths represent conditional percentages.

New cards

Independence (categorical variables)

Two categorical variables are independent if knowing one variable’s category does not change the conditional distribution of the other.

New cards

Simpson’s paradox

A situation where a trend present in several groups disappears or reverses when the groups are combined, often due to a lurking variable.

New cards

Scatterplot

A graph of paired quantitative data with one point per individual, used to assess direction, form, strength, and unusual features.

New cards

Direction (scatterplot)

Whether y tends to increase as x increases (positive) or decrease as x increases (negative).

New cards

Positive association

An association where larger values of one quantitative variable tend to be paired with larger values of the other.

New cards

Negative association

An association where larger values of one quantitative variable tend to be paired with smaller values of the other.

New cards

Form (scatterplot)

The overall shape of the relationship in a scatterplot (e.g., linear, curved) and whether clusters appear.

New cards

Linear relationship

A relationship that is well summarized by a straight line pattern in a scatterplot.

New cards

Nonlinear (curved) relationship

A relationship with clear curvature; linear tools like correlation and LSRL can be misleading if the form is curved.

New cards

Strength (scatterplot)

How closely points follow the overall form (especially a line if linear); not the same as having a steep slope.

New cards

Cluster (scatterplot)

A grouping of points in a scatterplot that may suggest subgroups or a missing categorical variable.

New cards

Outlier (scatterplot)

A point far from the overall pattern that can affect correlation and regression and may indicate an error or special case.

New cards

Correlation coefficient (r)

A unit-free number measuring the direction and strength of the linear association between two quantitative variables.

New cards

Range of r

The correlation r is always between -1 and 1, inclusive.

New cards

Unit-free property of r

Correlation has no units and does not change when measurement units are changed (e.g., inches to centimeters).

New cards

Non-resistance of r

Correlation is not resistant; outliers or influential-looking points can strongly change $r$ .

New cards

Leverage

A property of a point with an extreme x-value (far from x̄) that gives it strong potential to affect the regression line.

New cards

Least-squares regression line (LSRL)

The regression line that minimizes the sum of squared residuals, $\Sigma(y_i - \text{ŷ}_i)^2$ , and passes through ( $\bar{x}, \bar{y}$ ).

New cards

Regression equation

A linear prediction model written as ŷ = a + bx that predicts the response y from the explanatory variable x.

New cards

Predicted value (ŷ)

The value of the response variable predicted by the regression equation for a given $x$ .

New cards

Slope (b) in regression

The predicted change in y for each 1-unit increase in x (with units of “y-units per x-unit”).

New cards

Intercept (a) in regression

The predicted y-value when x = 0; meaningful only if x = 0 is within the data range and sensible in context.

New cards

Residual

The prediction error for a point: $\text{residual} = y - \text{ŷ}$ ; positive means the model underpredicted, negative means it overpredicted.

New cards

Residual plot

A graph of residuals versus x used to check model appropriateness; good models show random scatter around 0 with roughly constant spread.

New cards

Coefficient of determination (r²)

The proportion of variability in the response variable $y$ explained by the linear relationship with $x$ using the regression model (between 0 and 1).

New cards

Extrapolation

Using a regression model to predict y for x-values outside the observed data range; risky because the linear trend may not continue.