(1) Basic Statistical Concepts Flashcards
Normalization
forces something into a normal distribution
Standardization
dividing it by something to remove its effect
Ex: dividing something by area of pop size
QQ/quantile plot
Visualization to see if data is normally distributed
negative = points curve beneath line
positive skew = points curve above
normal = points are on line
r or coefficient of correlation
looks at whether 2 variables vary together
Range for correlation coefficient and what is positive/negative/0?
-1 to 1
positive = both variables go up
negative = one goes up, one goes down
0 = no association
Standard deviation
measures how far data values are from the mean
little variation in values means small standard deviation
Analysis of variance (ANOVA)
Parametric test to see if there are significant differences in 3+ categorical groups
Covariance
Testing 2 variables to see if they vary together or not using a correlation coefficient (r)
Kernel density (3 facts about it)
- removes statistical noise from data by smoothing it
- Uses Gaussian weighting (closer points = more weight)
- good for showing generalized densities of points
p value (3 facts)
- doesn’t tell you size of difference, just that there is one
- says if result is significant
- whether or not to reject null hypothesis
How to use a p value in a sentence to explain random chance and null hypothesis (hint: %)
- ___% chance you saw these results by random chance
- ___% chance you are falsely rejecting the null hypothesis
histogram
x-axis = category
y-axis = frequency in that category
way to visualize frequency/distribution of data
Z score meaning
Number of standard deviations away from the mean
Z score formula
(score - mean) / standard deviation
Coefficient of determination (r-squared)
High = good fit
Low = poor fit
How much of the variance in y is described by variance in x