Topic 8 Linear Regression Flashcards
What is the goal of linear regression in statistics?
To explain variability in a dependent variable using a linear relationship with an independent variable.
True/False
Linear regression only applies to physical processes.
False — it applies to any measurable variable relationships, even random ones.
In the linear model
Y = a + bX + Z, what is Z?
The error term, assumed to be independent of X and with zero mean.
The values of a and b are chosen to minimize the sum of _________.
squared residuals
What is the formula for the parameters a and b?
b̂ = Cov(X, Y) / Var(X)
â = Ȳ − b̂ X̄
True/False
The regression coefficient estimates â and b̂ are biased?
False - they are unbiased
What is the formula for Var(b̂)?
Var(b̂) = σ_Z² / (n · Var(X))
What is the formula for Var(â)?
Var(â) = (1/n + (X̄² / (n · Var(X)))) · σ_Z²
What does the coefficient of determination r² represent?
The proportion of total variance in Y explained by the linear model.
Which of the following is NOT a typical inference question in linear regression?
a) Is slope b ≠ 0?
b) What is the mean Y at X = x₀?
c) Is X normally distributed?
d) What is the predicted Y at X = x₀?
c) Is X normally distributed?
What is the estimate of the standard deviation of Z?
σ̂_Z = sqrt(Var̂(Z))
What is the unbiased estimate of the variance of Z (the error term)?
Var̂(Z) = (n * var(Y) - cov(X, Y)^2 / var(X)) / (n - 2)
What is the coefficient of determination (r-squared)?
r² = cov(X, Y)^2 / (var(X) * var(Y))
What is the confidence interval for the slope b when population variance is known?
b̂ ± z_(α/2) * sqrt(Var(Z) / (n * var(X)))
What is the t-statistic used for confidence interval of b when variance is unknown?
T = (b̂ - b) / sqrt(Var̂(Z) / (n * var(X)))
What is the confidence interval for the slope b using Student’s t-distribution?
b̂ ± t_(n-2, α/2) * sqrt(Var̂(Z) / (n * var(X)))
What is the t-statistic for the intercept a?
T = (â - a) / sqrt(Var̂(Z) * (1/n + mean(X)^2 / (n * var(X))))
How does the prediction interval differ from the mean response interval?
It includes an extra variance term for the individual prediction (+1 term), making it wider.
What is the null hypothesis in a t-test on slope?
What is the test used to assess linear correlation between X and Y?
Correlation test using sample coefficient cxy.
What does the F-test evaluate in linear regression?
Whether the regression model explains a significant amount of variance in Y.
What does Matlab’s regress(Y, X) return?
a) Only slope
b) Only intercept
c) Regression stats including CIs
d) Histograms
c) Regression stats including CIs
What is the confidence interval for a mean response at X = x₀?
â + b̂ * x₀ ± t_(n-2, α/2) * sqrt(Var̂(Z) * (1/n + (x₀ - mean(X))^2 / (n * var(X))))
What is the confidence interval for a prediction of Y at X = x₀ (including Z)?
â + b̂ * x₀ ± t_(n-2, α/2) * sqrt(Var̂(Z) * (1/n + (x₀ - mean(X))^2 / (n * var(X)) + 1))
What is the t-statistic for testing the slope (b = b*)?
T = (b̂ - b*) / sqrt(Var̂(Z) / (n * var(X)))
What is the V-statistic for testing correlation Corr(X,Y) = 0?
V = sqrt((n - 3)/2) * ln[((1 + c_xy)(1 - Corr(X, Y))) / ((1 - c_xy)(1 + Corr(X, Y)))]
What is the regression sum of squares (ssreg)?
ssreg = sum[(â + b̂ * xᵢ - (â + b̂ * mean(X)))^2] = n * var(â + b̂X)
What is the residual sum of squares (ssres)?
ssres = sum[(yᵢ - (â + b̂ * xᵢ))^2] = (n - 2) * Var̂(Z)
What is the total sum of squares (sstot)?
sstot = sum[(yᵢ - mean(Y))^2] = ssreg + ssres = n * var(Y)
What is the F-statistic for testing regression significance?
F = (ssreg / 1) / (ssres / (n - 2))