3 - Regressions Flashcards
What is the general form of a simple linear regression model?
Yi = β0 + β1 * Xi + εi, where Yi is the dependent variable, β0 is the intercept, β1 is the slope, Xi is the independent variable, and εi is the random error term.
What are the two components of the response variable Yi in a simple linear model?
Yi is composed of a deterministic component (β0 + β1 * Xi) and a random component (εi), where β0 is the intercept, β1 is the slope, Xi is the independent variable, and εi is the random error term.
What does the term E(Yi) represent in the context of linear regression?
E(Yi) = β0 + β1 * Xi, where E(Yi) is the expected value of Yi given Xi, β0 is the intercept, and β1 is the slope of the regression.
Define the variance Var(Yi) in a simple linear model.
Var(Yi) = σ² = RSS/(n-2), where σ² is the variance of the random error term εi and Yi is the dependent variable, RSS is the Residual Sum of Squares = Σεi²
What does the intercept β0 signify when X = 0?
β0 represents the mean value of Yi when the independent variable Xi equals zero, provided the model includes X = 0.
What method is used to estimate the parameters β0 and β1?
The least squares method is used to estimate the parameters β0 and β1, minimizing the sum of squared residuals.
What are the least square estimators of β0 and β1?
The least squares estimators are β1 = Σ(Xi - X̄) * (Yi - Ȳ) / Σ(Xi - X̄)² for the slope, and β0 = Ȳ - β1 * X̄ for the intercept, where X̄ is the mean of Xi and Ȳ is the mean of Yi.
What is the Gauss-Markov theorem about?
The Gauss-Markov theorem states that under the assumptions of linearity, independence, homoscedasticity, and zero-mean errors, the least squares estimators of β0 and β1 are the best linear unbiased estimators (BLUE).
What is the formula for the coefficient of determination R²?
R² = 1 - (SSRes / SSTot), where SSRes is the residual sum of squares and SSTot is the total sum of squares.
In a simple linear regression model, what are residuals?
Residuals are the differences between the observed values Yi and the predicted values Ŷi, calculated as εi = Yi - Ŷi.
What is the formula for the least squares estimator of the slope?
β1 = (Σ(Xi - X̄) * (Yi - Ȳ)) / Σ(Xi - X̄)², where β1 is the slope, Xi is the independent variable, X̄ is the mean of Xi, Yi is the dependent variable, and Ȳ is the mean of Yi.
What is the formula for the least squares estimator of the intercept?
β0 = Ȳ - β1 * X̄, where β0 is the intercept, Ȳ is the mean of Yi, β1 is the slope, and X̄ is the mean of Xi.
What is the formula for the fitted regression line?
Ŷ = β0 + β1 * Xi, where Ŷ is the predicted value of Yi, β0 is the intercept, β1 is the slope, and Xi is the independent variable.
What is the formula for the coefficient of determination R²?
R² = 1 - (SSRes / SSTot), where R² is the coefficient of determination, SSRes is the residual sum of squares, and SSTot is the total sum of squares.
What is the formula for the residual sum of squares (SSRes)?
SSRes = Σ(Yi - Ŷi)², where SSRes is the residual sum of squares, Yi is the observed value, and Ŷi is the predicted value from the regression line.
What is the formula for the total sum of squares (SSTot)?
SSTot = Σ(Yi - Ȳ)², where SSTot is the total sum of squares, Yi is the observed value, and Ȳ is the mean of Yi.
What is the formula for the variance of the slope estimator?
Var(β1) = σ² / Σ(Xi - X̄)², where Var(β1) is the variance of the slope estimator, σ² is the variance of the errors, σ² = RSS/(n-2), where RSS is the Residual Sum of Squares = Σεi²
What is the formula for the mean squared error (MSE)?
MSE = SSRes / (n - 2), where MSE is the mean squared error, SSRes is the residual sum of squares, and n is the number of observations.
What is the formula for the confidence interval of the slope coefficient?
β1 ± t(1-α/2) * SE(β1),
where CI is the confidence interval, β1 is the slope coefficient, t(1-α/2) is the critical t-value, and SE(β1) is the standard error of the slope coefficient.
Explain the difference between the constant term and the random term in the simple linear model.
The constant term is β0 + β1 * Xi, representing the deterministic part of the model, where β0 is the intercept and β1 is the slope; the random term is εi, representing the error or unexplained variation in Yi.
How would you interpret the slope β1 in a simple linear regression model?
The slope β1 represents the change in the expected value of Yi for a one-unit increase in Xi, assuming all other variables remain constant.
Why is the assumption of independent and identically distributed (i.i.d.) errors important in regression analysis?
The assumption of i.i.d. errors ensures that each error term εi has the same variance σ² and that errors are independent, which is crucial for the validity of least squares estimators and hypothesis tests.
What does it mean when we say that the least squares estimators β̂0 and β̂1 are “unbiased”?
The estimators β̂0 and β̂1 are unbiased if their expected values E(β̂0) = β0 and E(β̂1) = β1, meaning that, on average, they correctly estimate the true parameters of the population.
How does the Gauss-Markov theorem support the use of least squares estimators in linear regression?
The Gauss-Markov theorem states that under the assumptions of linearity, homoscedasticity, and uncorrelated errors, the least squares estimators β̂0 and β̂1 are the best linear unbiased estimators (BLUE), having the minimum variance among all unbiased estimators.
Describe the role of residuals in determining the fit of a regression model.
Residuals, calculated as εi = Yi - Ŷi, represent the difference between observed values Yi and predicted values Ŷi. Smaller residuals indicate a better fit of the regression model to the data.
How would you interpret a coefficient of determination R² value of 0.85 in a regression model?
An R² value of 0.85 means that 85% of the variance in the dependent variable Yi is explained by the independent variable Xi in the regression model, indicating a strong relationship.
In the context of regression, what is the purpose of partitioning the total sum of squares (SSTot) into SSReg and SSRes?
Partitioning SSTot into SSReg (explained variance) and SSRes (unexplained variance) helps in understanding how much of the total variation in Yi is explained by the regression model (SSReg) and how much remains unexplained (SSRes).
How does the concept of homoscedasticity affect the interpretation of regression results?
Homoscedasticity means that the variance of the error terms εi is constant across all levels of Xi. If this assumption is violated, the standard errors of the coefficients β̂0 and β̂1 may be biased, affecting hypothesis tests and confidence intervals.