stats-linear regression Flashcards

Question 1

Q

for statsmodel regression, a high p-value (eg, > 0.5) suggest?

Answer

A

the intercept coefficient is not significantly different from zero, the variable holds little predictive power

Question 2

Q

for statsmodel regression, a low p-value (eg, < 0.05) suggest?

Answer

A

the intercept coefficient is significantly different from zero, the variable holds predictive power

Question 3

Q

regression: sum of squares total

Answer

A

sum of ((dependent variable - dependent variable mean)^2)….. measures the total variability of the dataset (variance)

Question 4

Q

regression: sum of squares regression

Answer

A

sum of ((predicted value - mean)^2)… measures the explained variability by the regression line

Question 5

Q

regression: sum of squares error

Answer

A

sum of ((observed value - predicted value)^2)… measures the unexplained variability by the regression line. “error”

Question 6

Q

regression: relationship among SST, SSE and SSR

Answer

A

SST = SSE + SSR. Total variability = explained variability + unexplained variability

Question 7

Q

OLS

Answer

A

ordinary least square, aims to minimize SSE

Question 8

Q

R-squared

Answer

A

SSR/SST: variability explained / total variability. in [0, 1]. A higher R-squared means a better regression model

Question 9

Q

adjusted R-squared

Answer

A

always < R-squared because it penalizes excessive use of variables.

Question 10

Q

when to drop an independent variable

Answer

A

When adjusted R-squared is lowered. F-statistics is lowered. The pvalue for that variable is high.

Question 11

Q

F-statistics

Answer

A

testing overall significance of the model. A higher F-statistics means better model. prob(F-statistics) is the pvalue for the F-statistics, and should be close to zero for a good model. It tests the null hypothesis that betas = 0