Test Prep Flashcards
What is the formula for the simple linear regression model?
y=β _0 +β_1*x_1
Where:
y: Dependent (response) variable
x_1: Independent (predictor) variable
𝛽_0: Intercept
𝛽_1: Slope (coefficient)
What does the dependent variable (y) represent in simple linear regression?
y is the dependent variable (response). It’s the outcome we are trying to predict or explain, like house price in a real estate model.
What does the intercept (𝛽_0) mean in simple linear regression?
β _0 is the intercept. It’s the predicted value of y when x=0. It shows where the line crosses the y-axis. For example, it could represent the price of a house with zero square feet.
What does the slope (𝛽_1) mean in simple linear regression?
β_1 is the slope (coefficient). It shows how much y changes when x increases by 1 unit. For example, it might show how much house price increases for each extra square foot.
What is the error term (ϵ) in simple linear regression?
ϵ represents the error or residual, the difference between the actual y and the predicted y. It accounts for variation in y that isn’t explained by x.
What does the intercept tell us when the predictor variable (x) is zero?
It tells us the predicted value of y when x=0, essentially giving the baseline value of the response variable.
What does covariance measure?
Covariance measures whether two variables tend to move in the same direction (positive) or in opposite directions (negative).
What does a positive covariance indicate?
A positive covariance indicates that the two variables increase or decrease together.
What does a negative covariance indicate?
A negative covariance indicates that as one variable increases, the other decreases.
Why is covariance difficult to interpret?
Covariance is hard to interpret because its value depends on the scale of the variables and can be any large positive or negative number.
What does correlation measure?
Correlation measures both the direction and the strength of the relationship between two variables.
What are the value boundaries of correlation?
Correlation is always between -1 and 1, with:
1 meaning a perfect positive relationship.
-1 meaning a perfect negative relationship.
0 meaning no relationship.
How is correlation different from covariance?
Correlation is standardized and bounded between -1 and 1, making it easier to interpret than covariance, which has no fixed scale.
What does R-squared measure in a regression model?
R-squared measures the proportion of the total variability in the outcome variable that is explained by the predictor variable(s) in the model.
What is the range of R-squared values?
R-squared values range from 0 to 1. A value of 0 means the model explains none of the variability, while 1 means the model explains all the variability.
How is R-squared related to residuals?
R-squared is calculated as
1− (TotalSumofSquares(TSS)/SumofSquaredResiduals(SSR)). It reflects how much of the data’s variation is captured by the model compared to the residuals
What does a high R-squared value indicate about residuals?
A high R-squared value indicates that the residuals are small, meaning the model’s predictions are close to the actual values and the model fits the data well.
What does a low R-squared value indicate about residuals?
A low R-squared value indicates that the residuals are large, meaning the model’s predictions are far from the actual values and the model does not fit the data well.
What are predicted values in a regression model?
Predicted values (ŷ) are the values estimated by the regression model for the outcome variable based on the predictor variables.
What are observed values of y in a regression model?
Observed values (y) are the actual values of the outcome variable collected during data gathering.
How is a residual calculated in a regression model?
Residuals are calculated as:
Residual = Observedvalue (𝑦) − Predictedvalue (ŷ)
What does a positive residual indicate?
A positive residual indicates that the model under-predicted the outcome (the actual value is higher than the predicted value).
What does a negative residual indicate?
A negative residual indicates that the model over-predicted the outcome (the actual value is lower than the predicted value).
What is the range of correlation coefficients?
Correlation coefficients range from -1 to 1.
What does a correlation coefficient of 1 indicate?
A correlation coefficient of 1 indicates a perfect positive correlation; as one variable increases, the other variable increases proportionally.
What does a correlation coefficient of -1 indicate?
A correlation coefficient of -1 indicates a perfect negative correlation; as one variable increases, the other variable decreases proportionally.
What does a correlation coefficient of 0 indicate?
A correlation coefficient of 0 indicates no linear relationship; the variables do not have a consistent pattern of moving together.
How do you interpret a positive correlation coefficient (e.g., 𝑟 > 0)?
A positive correlation coefficient means that as one variable increases, the other variable also increases.
How do you interpret a negative correlation coefficient (e.g., r<0)?
A negative correlation coefficient means that as one variable increases, the other variable decreases.
What does a correlation coefficient close to 0 indicate?
A correlation coefficient close to 0 indicates a weak or negligible relationship between the variables.
What does a correlation coefficient around 0.3 to 0.7 (or -0.3 to -0.7) suggest?
This suggests a moderate relationship between the variables. (Subject to interpretation)
What does a correlation coefficient close to 1 or -1 indicate?
This indicates a strong relationship; the variables move closely in sync with each other.
What should you remember about correlation in relation to causation?
Correlation does not imply causation; a high correlation doesn’t mean one variable causes the other.
What kind of relationships does the correlation coefficient capture?
The correlation coefficient captures linear relationships only; non-linear relationships are not well represented.
How can outliers affect the correlation coefficient?
Outliers can heavily influence the correlation, making it seem stronger or weaker than it actually is for most of the data.
How do you interpret a positive slope coefficient (𝛽) in a regression model?
A positive slope coefficient means that as the predictor variable increases, the outcome variable is expected to increase as well. It shows the rate of increase in the outcome for every one-unit increase in the predictor.
How do you interpret a negative slope coefficient (𝛽) in a regression model?
A negative slope coefficient means that as the predictor variable increases, the outcome variable is expected to decrease. It shows the rate of decrease in the outcome for every one-unit increase in the predictor.
What does the magnitude of a slope coefficient tell you in a regression model?
The magnitude of a slope coefficient indicates how strong the relationship is between the predictor variable and the outcome variable. Larger coefficients mean a bigger effect on the outcome, while smaller coefficients indicate a weaker effect.
How would you use the intercept and slope coefficients to make predictions in a regression model?
Use the intercept as the baseline value of the outcome when predictors are zero. Add the product of the slope coefficients and their corresponding predictor values to the intercept to make predictions about the outcome.
What does the t-distribution help with in regression analysis?
the t-distribution helps determine if the estimates from the regression are statistically significant and reliable.
What is a t-statistic in regression?
The t-statistic tells us if a variable in your model has a strong impact on what you’re trying to predict. It does this by comparing how big the variable’s effect is to the variability of that effect. A large t-statistic means the effect is strong and likely real, while a small one means it might just be random noise.
Example:
If you’re analyzing how hours studied affects exam scores:
The t-statistic for the coefficient of hours studied helps determine if the relationship you observe is likely due to a true impact of studying hours on scores, or if it might be just a coincidence in your sample.