stats-linear regression Flashcards

1
Q

for statsmodel regression, a high p-value (eg, > 0.5) suggest?

A

the intercept coefficient is not significantly different from zero, the variable holds little predictive power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

for statsmodel regression, a low p-value (eg, < 0.05) suggest?

A

the intercept coefficient is significantly different from zero, the variable holds predictive power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

regression: sum of squares total

A

sum of ((dependent variable - dependent variable mean)^2)….. measures the total variability of the dataset (variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

regression: sum of squares regression

A

sum of ((predicted value - mean)^2)… measures the explained variability by the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

regression: sum of squares error

A

sum of ((observed value - predicted value)^2)… measures the unexplained variability by the regression line. “error”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

regression: relationship among SST, SSE and SSR

A

SST = SSE + SSR. Total variability = explained variability + unexplained variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

OLS

A

ordinary least square, aims to minimize SSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

R-squared

A

SSR/SST: variability explained / total variability. in [0, 1]. A higher R-squared means a better regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

adjusted R-squared

A

always < R-squared because it penalizes excessive use of variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

when to drop an independent variable

A

When adjusted R-squared is lowered. F-statistics is lowered. The pvalue for that variable is high.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

F-statistics

A

testing overall significance of the model. A higher F-statistics means better model. prob(F-statistics) is the pvalue for the F-statistics, and should be close to zero for a good model. It tests the null hypothesis that betas = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly