6+7: Data analysis I & II: descriptive & inferential statistics Flashcards
6(7) basic statistics applied to levels of measurement to measure 2 properties
- nominal scales
- central tendency –> mode
- ordinal scales
- central tendency –> median
- dispersion –> range (but quartile distance in presence of extreme value!)
- interval & ratio scales
- central tendency –> mean
- dispersion –> variance & standard deviation
statistical difference
- def=
- it is like…
- a diff could…
- A statistical difference is a function of the difference between means relative to the variability.
- Like a signal-to-noise ratio.
- a diff could be due to chance, esp. if variability is high
t-test def=
+ 3 associated values
The t-test assesses whether the means of two groups are statistically different from each other
+ 3 associated values
- t-value = ( avg_T - avg_C ) / SE( avg_T - avg_C )
- p-value = probability of t-value due to randomness only
- alpha level = significance level, often 0.05, used as threshold
null VS alternative hypothesis
null H is conservative, alternative H represents a change as compared to the current state of knowledge
Type I error
& alpha level
in hypothesis testing
= falsely rejecting a null H
w probability p = alpha level (commonly set at 5%)
- test statistic
- used for
- properties
- sample types
- a statistic that reflects the ratio of systematic over unsystematic variation
- used for statistical tests
- it has a known distribution, so that you can calculate the p-value
- t, F statistics
3 important test types and when to use them
numerical dep. variable VS numerical indep. variable => regression analysis
numerical dep. variable VS categorical indep. variable
=> ANOVA or t-test
ANOVA meaning
ANalysis Of VAriance
regression analysis w a linear model:
- model
- method
- = linear equation + probabilistic error term
- => minimize distance from empirical measurements & predictions; typically, using least square distance(least squares method)
attention point w large datasets
sample size tends to make every parameter significant; to check for that, it is necessary to consider the R squared value.
standardizing variables =
what for?
subtract mean, then divide by stdev
to make them comparable!
ANOVA compares 2 elements:
+ name of independent variables in ANOVA
- variation between the groups
- variation within the groups
+ independent variables in ANOVA are called factors