stats2 Flashcards
skew. Which module to use. What it is. How to interpret.
stats.skew(arr). Whether there is outliers. right skew = positive skew = mean > median. Left skew = negative skew = mean < median. abs(skew) >1 indicates substantial skewness.
kurtosis. Which module to use. How to interpret.
stats.kurtosis(arr). Whether the distribution is too peaked. kurtosis >1 indicates the distribution is too peaked. kurtosis < –1 indicates a distribution that is too flat.
variance formula and meaning
sum(( y_i - mean) ^2) /n . measures how dispersed the dataset is
coefficient of variation formula. why use it (compared to std)?
std/ mean. used to compare two or more datasets. Does not have unit of measure whereas std has.
covariance formula. how to interpret?
sum((x_i - x_mean)*(y_i - y_mean)) / n. Variables move together if > 0. Variables move in opposite directions if <0. Variables independent if =0. abs(covariance) can only be interpreted along with std of the two arrays, or look at correlation coefficient
correlation coefficient formula. how to interpret?
covariance / (std_x * std_y). in [-1, 1]. 1 means one variable perfectly explains the other, positively correlated. -1 means the opposite. 0 means the two variables are independent.
how to interpret logistic regression variable coefficient?
Everything else held constant, k unit increase in the independent variable increases the odds of dependent variable by e^(kcoef_). Eg, if e^(kcoef_) = 5 we say odd increase 5 times.
acceptable range for pseudo r-squared for logistic regression
0.2 to 0.4
for logistic regression, LLnull and log-likelihood relation?
log likelihood should be much bigger than LLnull. LLnull is the likelihood of dependent variable without any independent variable (a useless model)
for logistic regression, which summary tells the significance of model?
LLR pvalue. It should be close to zero. it is log likelihood ratio. It measures if our model is statistically different from the LLNull.