W3 Multiple Linear Regression Flashcards
WHAT is multiple linear regression
the linear relationship between the dependent variable y and 2 or more independent x variables
equation of line in multiple regression
Y = B0 + B1X1 + B2X2 … + E
what is r^2 the coefficient of multiple determination
tells us how much of Y is explained by x independent variables
why is using adjusted r^2 more reliable
never decreases degrees of freedom when a new x variable is added to the model
what is a residual error
difference between actual and assumption
should residuals be random or not
random
confidence interval for the population slopeof b
coefficient of B +- t stat*standard error
how to know when to reject null hypothesis
f stat > critical f reject
how to test contributions of a single variable
test with all variables
test with all variables except the one we’re testting
why might you want to test contributions of a single variable
maybe the variable are just getting a leg up from others that already had an effect
what does the coefficient of parial determination tell us
how much of the variance is described by 1 variable when the others are held constant
when are dummy variables used
when data is categorical
what two numbers are used rather than numerical data as the xs in equation including dummy variable
1 for present
0 for absent
how to test interaction between independent variables
ssr(all) - ssr(all except new variables) / mse (all)
when is logistic regression used
when the Y variable is binary (a dummy variable) eg
prefer A or B
voted or didn’t vote
In what industry is logistic regression used
machine learning and AI
what is odds ratio
prob of event of interest / 1 - prob of event of interest
what is estimated odds ratio
e^ln(odds ratio)
what is estimated probabilit
estimated odds ratio / 1 + estimated odds ratio
when testing to see if a non liner model should be used, how do we pick the model with the best fit
the one with the highest r^2
two types of transformations to transform non linear models into linearones
square root
log
what is the problem with collinearity in regression
you cannot hold one variable constant because of the close relationship between variables
what to do in the case of colinear variables
avoid regression or choose one to include
indications that collinearity has happened
incorrect signs on coefficients
large change in value of previous coefficient when new one is added
variance of model increases when new variable is added
how to lower probability of collinearity
remove unimportant variables
purpose of a partial f test
to see the level of contribution of variables
what is stepwise regression
adding variables 1 by 1
if r^2 goes up, keep the variable, otherwise dont
how to create stepwise regression graphs
insert > scatter >
how to add trend lines and equations of lines on scatter plots
right click on a point > add trend line and pick the type which gives the best r^2
if you have to make a squared or rooted version of a variable, when you select data for regression do you not include the normal version or do you
you do
if there is significant interaction between two variables, how can you make them into one variable to simplify the model
multiplying them together and include this new variable in regression analysis, check and see if it improves the model significantly
if you had to construct a 95% confidence level interval of slope, where would you find this
in the coefficients table from regression
how would you compute and interpret the coefficients of partial determination.
- perform anova on data with all independent variables
- perfrom anova on data with variable we are interested in finding the contribution of only
- find the absolute difference between both of their regression sum of squares
- get partial f stat by putting ans3^ over MS from first anova
- get r^2