Inferential Statistics Flashcards
Parametric assumptions
Normality
Homogeneity of Variance
Independence of observations (not relatied onless repeated measures)
Normality
sampling distribution of the mean
DV like working memory
take a group and get a sample mean
each time you get a sample you have a distrubuation with a mean.
if you did this thousands of times, all the means would appear normal
test of normality
Kolmogorov-Smirnov test and Shapiro-wilk test
statisticians say this is a waste of time. but you shouldl look at your data to make sure there are no major differences in normality.
(skewness or kurtosis)
or just plot your data. Histogram or QQ plot
Histogram
frequency of scores on Y, individual scores on X
you can see most frequent occuring score.
QQ plot
quantile is a type of percentile- how much data are included in this value.
how much data are in the different regions (you want it to be a straight line, circles are data points, 16% here and here and you will need less in the middle.
same info from histogram but its it on the line or not.
Homogeneity of Variance
homoscedasticity
do the different groups have the same variance
most tests are robust to violations as long as the grop sizes are equal. (levenes)
arcsine and rau
arcsine - uses radians
rau - more math - but becomes percent correct
could they have done worse than 0% if ranking extended that far? could they have performed the 100% better than someone else.
positive skew
tail is dragged out in positive direction
welch correction
for unequal variances in two-sample t-tests
default in r
non-parametric
doesn’t assume normality
logistic regression
powerful
using all data, not means
independent observations
two SNR, if its independent T test then only one person can contribute to each
if same person did do both you can run repeated measures analysis
when one person performs two different conditions then there is something related
can you predict snr .1 from snr 2? yes
group comparisons - t test
why use t test?
z is normal gaussian curve (mean 0 SD 1)
t is shifted slightly, not standard gaussian and we use because its better for small sample sizes when we dont actually know the true population mean and SD
(humans are not something we typically know the underlying distribution)
t test comparing signal to noise
what is the effect, divided by the random variation????
single sample
one group vs baseline
paried samples t test
compares two matched paired sets of observations (single-sample of paired differences)
independent samples
compares two separate groups of observations (pooled SD)
increase effect size
low standard deviation
high difference between the means
error bars
confidence intervals - a quarter of the length of the overall length can have statistical difference
SEM - need to have at least half the length of the error bar difference to show an effect.
thats why use ***
confidence intervals
mean +/- the t criteria
and does not include 0
ANOVA
analysis of variance, parametric statistic
mean difference relative to variability.
difference in the amount of variability between groups vs within groups
main effects . - effects of each IV and how do they interact with one another
ANOVA factors
One way
two way
three way
ANOVA Levels
each factor can have 2+ levels (age factor - young and old are the levels)
2x2x3
three factors (three IV)
young and old
college grad vs not
speech rate (fast med slow)
types of assignment
random (true exp)
non-random (quasi experiemntal and non exp)
- in tact groups, matched group design,
p value-
probability that your observed results (or a more extreme result) came from the distribution or your null hypothesis.
one tail or two
only use one tail if obtaining a result in the other direction is impossible or interpretable.
significance
significance is not importance
not more or less sig…. like you cant say passed the exam more than stacey. this idea would be effect size.
type I error vs type II
first error was believe him (there was no wolf)
the second error was not believing him (there was a wolf)
pearson correlation (r)
relationship between two interval ratio measures
-1 to 1d
spearman rank-order correlation
strength/ direction of assoication between two ranked (ordinal variables)
chi-square and contingency coefficient
association between nominal variables
regression
predictive value of association
multiple regression
strongest comination of IV that predict a DV
what to include in paper?????
pearson
- r, pvalue
- some include df stats for each vairable and r2
type of analysis conducted, two tailed perison threshold for sig.
Working memory span and word identification were significantly positively correlated (r = 0.41, p < .05).
There was a significant positive correlation between working memory span (M = 3.34, SD = .10) and word identification (M = 9.43, SD = 1.03), r(23) = 0.41, p = 0.42.