1/49
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Least-squares regression line
The line that minimizes the sum of squared residuals and summarizes the linear relationship between an explanatory variable x and a response variable y in a sample.
Explanatory variable (x)
The variable used to explain or predict changes in the response; typically placed on the horizontal axis in regression.
Response variable (y)
The outcome variable being predicted or explained by x; typically placed on the vertical axis in regression.
Sample slope (b1)
The slope of the least-squares regression line from a sample; estimates how the predicted y changes for a 1-unit increase in x.
Population slope (β1 or β)
The true slope parameter in the population regression model; represents how the population mean response changes with x.
Sample intercept (b0)
The intercept of the sample regression line; the predicted value of y when x = 0 (may not be meaningful if x=0 is outside the data’s context).
Population intercept (β0 or α)
The true intercept parameter in the population regression line; the population mean response when x = 0.
Population regression line
The population model for the mean response: μy = β0 + β1x (equivalently μy = α + βx).
Sample regression line
The fitted line from sample data: ŷ = b0 + b1x.
Slope inference
Using sample regression results to draw conclusions about the population slope parameter β1 (e.g., testing β1=0 or estimating β1 with a confidence interval).
Null hypothesis for slope
A statement about the population slope, most commonly H0: β1 = 0 (no linear relationship in the population).
Alternative hypothesis for slope
The competing claim about the population slope, such as Ha: β1 ≠ 0, Ha: β1 > 0, or Ha: β1 < 0 (chosen based on context).
Slope of 0 (flat population line)
A population slope of 0 means the population mean response does not change as x changes; no linear relationship is supported.
Linear relationship (regression context)
A relationship where the mean of y changes approximately linearly with x; required for valid linear regression slope inference.
Correlation–slope test equivalence (simple linear regression)
With one explanatory variable, testing for a linear relationship via regression is equivalent to testing whether the population correlation is 0, but regression questions should be phrased in slope terms.
Association
A relationship between variables where changes in one are related to changes in the other; regression slope inference primarily supports association in a population.
Causation
A cause-and-effect relationship; can be concluded from a significant slope only when the study design is a randomized experiment (within its scope).
Randomized experiment
A study where treatments are randomly assigned; supports cause-and-effect conclusions when conditions are met and results are significant.
Observational study
A study where variables are observed without random assignment; a significant slope supports association only, not causation.
Lurking variable
An unmeasured variable that may influence both x and y, potentially explaining an observed association in an observational study.
Sampling distribution of the slope
The distribution of sample slopes b1 that would be obtained from repeated samples (or repetitions of an experiment) from the same population.
Mean of the sampling distribution (μb)
The average of all possible sample slopes; under conditions, this mean equals the true population slope β1.
Standard deviation of the sampling distribution (σb)
The true spread of sample slopes b1 across repeated samples; typically unknown in practice.
Standard error of the slope (SEb1)
An estimate of the standard deviation of the sampling distribution of b1, computed from sample data; used for t inference about the slope.
t distribution (for slope inference)
The distribution used for slope tests/intervals because the true variability is unknown and must be estimated from sample residuals.
Residual
The vertical difference between an observed y value and its predicted value: y − ŷ.
Residual standard deviation (s)
A measure of typical prediction error around the fitted line: s = sqrt( Σ(y−ŷ)² / (n−2) ).
Degrees of freedom (df = n − 2)
The df used in regression slope t procedures; n−2 because both slope and intercept are estimated from the data.
t statistic for slope
Standardizes the difference between the sample slope and hypothesized slope: t = (b1 − β1) / SEb1 (often with β1=0).
p-value (slope test)
The probability, assuming H0 is true, of observing a sample slope (or t statistic) at least as extreme as the one obtained, in the direction(s) of Ha.
Significance level (α)
The cutoff probability for deciding whether evidence is strong enough to reject H0 (e.g., α = 0.05).
Statistical significance
A result is statistically significant if the p-value is less than α, indicating evidence against H0 beyond random sampling variation.
Practical importance
Whether an effect is large enough to matter in context; statistical significance does not guarantee practical importance.
Confidence interval for the population slope
A range of plausible values for β1, typically computed as b1 ± t*SEb1, and interpreted as change in the population mean response per 1 unit of x.
Critical value (t*)
The t multiplier from the t distribution (with df = n−2) that matches the desired confidence level for an interval.
Margin of error (ME)
The amount added/subtracted in a confidence interval: ME = t* × SEb1.
Mean response (μy)
The population average value of y at a given x; regression inference targets how μy changes with x, not individual outcomes.
Predicted value (ŷ)
The value of y predicted by the sample regression line for a given x.
Linearity condition
Condition that the relationship between x and the mean of y is approximately linear; checked with a scatterplot and/or residual plot (no curved pattern).
Independence condition
Condition that observations are independent; supported by random sampling or random assignment and by avoiding situations with dependence (e.g., related subjects).
10% condition
When sampling without replacement, independence is plausible if the sample size n is less than 10% of the population.
Time correlation
Dependence across observations collected over time (e.g., daily prices/temperatures) that can violate the independence assumption.
Normality of residuals condition
Condition that residuals are approximately normally distributed around the line; checked with a histogram or normal probability plot of residuals (not y itself).
Equal variance (constant spread) condition
Condition that the variability of residuals is roughly constant across x; checked by looking for no fanning/funneling in a residual plot.
Funnel (fan) pattern
A residual plot pattern where residual spread increases or decreases with x, indicating nonconstant variance (violating equal variance).
Influential point
A data point that strongly affects the fitted line (slope, SE, p-value) and can change conclusions; often associated with extreme x or large residuals.
High leverage point
A point with an extreme x-value compared to the rest of the data that can “pull” the regression line.
Outlier (large residual)
A point with an unusually large vertical deviation from the regression line; can distort regression results, especially if also high leverage.
Extrapolation
Using a regression model to predict for x-values far outside the observed range; predictions and inference are less trustworthy there.
r-squared (coefficient of determination)
The proportion of variation in y explained by the linear model with x; unitless and distinct from interpreting slope or establishing causation.