stats2 Flashcards

1
Q

skew. Which module to use. What it is. How to interpret.

A

stats.skew(arr). Whether there is outliers. right skew = positive skew = mean > median. Left skew = negative skew = mean < median. abs(skew) >1 indicates substantial skewness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

kurtosis. Which module to use. How to interpret.

A

stats.kurtosis(arr). Whether the distribution is too peaked. kurtosis >1 indicates the distribution is too peaked. kurtosis < –1 indicates a distribution that is too flat.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

variance formula and meaning

A

sum(( y_i - mean) ^2) /n . measures how dispersed the dataset is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

coefficient of variation formula. why use it (compared to std)?

A

std/ mean. used to compare two or more datasets. Does not have unit of measure whereas std has.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

covariance formula. how to interpret?

A

sum((x_i - x_mean)*(y_i - y_mean)) / n. Variables move together if > 0. Variables move in opposite directions if <0. Variables independent if =0. abs(covariance) can only be interpreted along with std of the two arrays, or look at correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

correlation coefficient formula. how to interpret?

A

covariance / (std_x * std_y). in [-1, 1]. 1 means one variable perfectly explains the other, positively correlated. -1 means the opposite. 0 means the two variables are independent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how to interpret logistic regression variable coefficient?

A

Everything else held constant, k unit increase in the independent variable increases the odds of dependent variable by e^(kcoef_). Eg, if e^(kcoef_) = 5 we say odd increase 5 times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

acceptable range for pseudo r-squared for logistic regression

A

0.2 to 0.4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

for logistic regression, LLnull and log-likelihood relation?

A

log likelihood should be much bigger than LLnull. LLnull is the likelihood of dependent variable without any independent variable (a useless model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

for logistic regression, which summary tells the significance of model?

A

LLR pvalue. It should be close to zero. it is log likelihood ratio. It measures if our model is statistically different from the LLNull.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly