Linear Regression Flashcards
Linear Regression
Examine linear relationship between independent/predictor variables and a continuous dependent/outcome variable
Simple Linear Regression
1 independent/predictor variable
Multiple Linear Regression
> 1 independent/predictor variable
Fitted Line (best fit) Equation
Y= b0 + b1X
b0 = intercept
b1 = slope
Slope
The average amount of change in the outcome variable for each one-unit increase in the predictor variable
-Characterizes the relationship or marginal effect
Residual
Relationship between each individual observation and the trend line can be measured by the vertical distance between each data point and the trend line
Residual is also known as what?
Estimated Error (u)
Can the residual be negative and positive values?
YES.
Positive = above the trend line
Below = under the trend line
What is the Regression Equation mean?
Sum of the fitted line and the error term
Sum of Squares Regression
How far the predicted values on the fitted line differ from the overall mean
SSB between on ANOVA
Sum of Squares Residual
Difference between the original data and the predicted values on the fitted line
SSW within on ANOVA
Residual and Regression Sum of Squares demonstrates what?
Variance
For a Regression calculation what must be done first?
ANOVA and F-Stat
-confirm or deny significance prior to proceeding with the rest of the regression calculations
ANOVA and F-Stat state whether or not the regression model is significant, but what does it NOT tell us?
- Positive or Negative Relationship
- Extent of change in Dependent Variable based on Independent Variable
b1 the SLOPE is what?
UNSTANDARDIZED estimate of slop coefficient
The t-stat in a regression where the p-value is less than alpha you would what?
REJECT null hypothesis that b=0
The 95% CI for a regression cannot include what if you want to reject the null hypothesis?
ZERO
b1 the Slope tells you the direction of the relationship (+/-), and extent of change in dependent variable based on independent variable, but it does NOT tell us what?
- Proportion of variance in dependent variable explained by independent variable
Coefficient of Determination
Proportion of the variance in the dependent variable explained by the independent variable
R^2 is the Coefficient of Determination (square of Pearson coefficient), and that R^2 value is interpreted how?
R^2 = variance in outcome explained by the exposure
1-R^2 = what?
Proportion of variance in outcome NOT explained by exposure
R^2 is bounded between 0 and 1 what do the values imply?
1 = excellent model fit
0 = no model fit
Adjusted R^2
- Always lower than R^2
- Adjusts for inherent increase in R^2 that occurs every time we add an independent variable to regression equation
- Preferable when comparing models and multiple linear regression
Multiple Linear Regression
Examine linear relationships between two or more independent variables and a continuous dependent variable
What does Multiple Linear Regression consider?
- Accounts for Confounders
- Assess for Moderator effects
Multiple Linear Regression does not look at a trend line but what?
Best Fit PLANE
The goal is to remove each and every possible source of bias this allow for what?
Estimates to be more unbiased, more efficient, and CLOSER to the true population parameter aka a lower variance
What type of outcome variable MUST be utilized in a linear regression?
CONTINUOUS
the predictor variables (exposure) do not have to be
Continuous Regression Coefficient
All other control variables besides the one being analyzed are held constant and the 95% confidence interval does NOT include 0
Binary Regression Coefficient
Dummy variables are BINARY and take a value of 1 if a particular criterion is met and zero otherwise
The 95% confidence interval INCLUDES 0
Is the Adjusted R^2 preferred or not?
YES, adjusted R^2 adds a penalty for incorporating extra control variables into the model
LOWER value due to the penalties
Adj R^2 < R^2
A large difference between the two values indicates the possibility of superfluous control variables
Manifestation of Effect of a Modified Variable
Should be tested and if statistically significant remain as an independent variable
How would you check to see if a Moderator Variable is significantly significant?
Two Way Interaction
1. Forces effects to be additive
2. If the outcome changes/depends on various factors affects the exposure the effects are NOT simply additive but ALSO MULTIPLICATIVE
b0 and b1 are what type of predictor variables?
UNSTANDARDIZED Estimates of intercept and slope coefficients
What is SE?
Standard Error of the Coefficients
What is B?
STANDARDIZED Regression Coefficient in SD UNITS
Can you compare between unstandardized and standardized units?
NO
t = b-BH0/SE
For a simple linear regression, the t-statistic is the square root of the analysis F statistic
Unstandardized b
Relative predictions of unstandardized regression coefficients CANNOT be compared t each other
Standardized B
Permit direct comparison of regression coefficients to one another (i.e. which independent variable explained more variance in the dependent variable)
What are the Factors that influence appropriateness of Multiple Linear Regression?
- Homoscedasticity
- Multicollinearity
Homeoscedasticity of Residuals
Variance is the same at ALL points along the regression line
If there is homoscedasticity present at a points appearing in a pattern, what type of test should be used instead?
Logistic Regression
Multicollinearity
Strong correlations between predictor variables (r> 0.90) =RESULT IN TYPE II ERROR
If Multicollinearity is present, what test should be used instead?
Stepwise Regression