Week 9 Flashcards
What is Regression Analysis
Study of the relationship between variables, how a dependent variables depends on the independent variable(s)
What is Causal Relationship?
An increase in variable X will cause variable Y to change, use random experiment/instruments to prove
What to do with outliers data?
If it’s clearly not a part of the population interest, delete them
If unsure, run the analysis with and without the outliers and present both
What is correlations?
Measures that indicate the strength of linear relationships between pairs of variables; usually between -1 and 1
What is the limitation of correlations?
Only for linear relationships, if correl is close to zero, could be because of non-linear relationship
What is a fitted value?
Predicted value of the dependent variable
What is “e”, Y, and Y hat?
Y = Actual Value
Y hat = Fitted Value
e = Residual
What is standard error of estimate?
Indicates the level of accuracy of predictions, the smaller the better
What is R Square?
Fraction of variation of the dependent variable, value is 0-1, the closer to 1, the more explainable it is
How the interpretation of the slope coefficient differs between simple regression and multiple regression?
Simple regression: If the independent variable change by 1 unit, the dependent change by … unit
Multiple regression: If independent variable X changes by 1 unit, EVERYTHING ELSE CONSTANT, the dependent variable increase by b1
Formula for multi: Predicted Y = a + b1X1 + b2X2 + …
One flaw of R Square and how to fix it
R Square always increase as independent variables increase, thus use Adjusted R Square
How do we know if we successfully adding variables to the regression model?
Check adjusted R Square (Increase)
Check standard error of estimates (decrease)
If R Square decreases and/or standard error of estimates increases, this means that the newly added variable is BAD
How do we navigate categorical variables (male/female)?
Use Dummy Variable, 0/1, not or yes
If more than 2 categories, make different dummy variables (but must be number of category -1)
e.g. Q1, Q2, Q3, Q4 -> Q1 1/0, Q2 1/0, Q3 1/0 ONLY
What is a general linear equation look like?
Predicted Y = a + b1X1 + b2X2 + …
Note: Can be: LogX2
What is nonlinear transformation?
It is when there is dependent or independent variables that is not an original data from the dataset
E.g. When they are interaction variables which are variables that resulted from products of or transformed from the original variables (?)