1/49
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
One-variable data
Data consisting of measurements on a single variable, analyzed by describing its distribution (center, spread, shape).
Two-variable data
Data with two variables measured on each individual, used to study whether and how the variables change together.
Association
A relationship between two variables where knowing the value of one provides information about the likely value of the other.
Categorical variable
A variable that places individuals into categories or groups (e.g., blood type, political party).
Quantitative variable
A variable that records numerical values for which arithmetic operations make sense (e.g., height, time).
Explanatory variable (x)
The variable used to explain or predict changes in another variable; often called the predictor.
Response variable (y)
The outcome variable you want to predict or understand; it responds to changes in the explanatory variable.
Paired data
Two-variable data where each individual contributes a matched pair of values (x, y); the pairing must be kept intact.
Two-way table
A table that displays counts for combinations of two categorical variables.
Contingency table
Another name for a two-way table of counts for two categorical variables.
Joint frequency
An interior cell count in a two-way table for a specific row-and-column category combination.
Marginal frequency
A row total or column total in a two-way table (a total for one variable, ignoring the other).
Table total (n)
The grand total of all counts in a two-way table; the overall sample size.
Marginal distribution
The distribution (often as proportions/percentages) of one categorical variable, ignoring the other variable.
Conditional relative frequency
A proportion computed within a given row or column (within a condition) to compare groups for association.
Segmented bar chart
A display for two categorical variables where each bar represents a group and segments show conditional relative frequencies (each bar totals 100%).
Mosaic plot
A display for two categorical variables that shows conditional proportions and can also reflect different group sizes via widths/areas.
Simpson’s paradox
A situation where an overall association changes direction or disappears when data are split into meaningful subgroups due to a lurking variable.
Bivariate quantitative data
A dataset containing two quantitative variables measured on the same individuals.
Scatterplot
A graph of paired quantitative data where each individual is plotted as a point (x, y) to visualize a relationship.
Direction (association)
Whether y tends to increase as x increases (positive), decrease as x increases (negative), or show no clear trend.
Form (scatterplot)
The overall shape of a relationship in a scatterplot (e.g., linear, curved).
Strength (scatterplot)
How closely points cluster around the form (e.g., weak, moderate, strong).
Outlier
An unusual point that does not fit the overall pattern; it can strongly affect correlation and regression.
Correlation (r)
A numerical measure of the direction and strength of a linear relationship between two quantitative variables.
Correlation properties (linear, unitless, not resistant)
Key facts about r: it measures only linear association, has no units, ranges from -1 to 1, and can be greatly affected by outliers.
Standardized value (z-score)
A value expressed in standard deviation units: z = (value − mean) / standard deviation.
Coefficient of determination (r^2)
For linear regression, the proportion of variation in y accounted for by the regression of y on x (between 0 and 1).
Least-squares regression line
The line that minimizes the sum of squared vertical residuals: Σ(y − ŷ)^2.
Regression equation (ŷ = a + bx)
The equation of a regression line giving predicted response ŷ from explanatory value x, with intercept a and slope b.
Predicted value (ŷ)
The model’s predicted value of the response variable y for a given x.
Slope (b)
In ŷ = a + bx, the predicted change in y for a 1-unit increase in x (with units of y per unit of x).
Intercept (a)
In ŷ = a + bx, the predicted value of y when x = 0; may or may not be meaningful in context.
Extrapolation
Using a regression model to predict outside the observed range of x; risky because the pattern may not continue.
Residual
The difference between an observed and predicted value: residual = y − ŷ (the vertical distance from a point to the regression line).
Residual sign (positive vs negative)
Positive residual: model underestimated (point above line). Negative residual: model overestimated (point below line).
Residual plot
A graph of residuals versus x (or versus ŷ) used to check whether a linear model is appropriate.
Curvature (residual plot pattern)
A systematic curved pattern in a residual plot, suggesting the relationship is not linear and a different model may be needed.
Changing spread (fan shape)
A residual-plot pattern where variability increases or decreases with x, suggesting non-constant variability.
Standard deviation of residuals (s)
A typical prediction error size in y-units: s = sqrt( Σ(y − ŷ)^2 / (n − 2) ).
Regression outlier
A point with an unusually large residual (far above or below the regression line compared to other points).
Influential point
A point whose removal would noticeably change the regression line (slope/intercept) and possibly r and r^2.
Leverage
A measure of how far an x-value is from the mean of x; high-leverage points have strong potential to affect the regression line.
Transformation (to achieve linearity)
Changing variables (often using log/ln or power transforms) to make a curved relationship more nearly linear and improve a linear model.
Lurking variable
A third variable that affects both variables being studied and may explain or distort an apparent association.
Confounding
When the effects of two variables on a response cannot be separated (common in observational comparisons).
Observational study
A study where researchers observe and record data without assigning treatments; supports association but not causation.
Controlled experiment
A study where researchers impose treatments (often with random assignment); can support cause-and-effect conclusions if well designed.
Random assignment
Randomly assigning individuals to treatments; supports a cause-and-effect conclusion (for the participants).
Random sampling
Randomly selecting individuals from a population; supports generalizing results to the population.