Exam 2 Flashcards
t distribution
- we use a sample standard deviation s instead of population standard deviation
- used when you don’t know population sd
- similar to normal but “thicker tails
- shape depends on sample size: df = n-1
- the bigger the sample size, the closer to normal
95% Confidence interval
x - tcrit (s/sqrtn) < u < x + tcrit (s/sqrtn)
conditions/assumptions of t test
- random sample
- observations should be independent of each other
- population should be normal or n >25
steps to hypothesis testing
- hypothesize: state null and alternative
- prepare: set sig level at 0.05, select test, check assumptions, find critical value
- compare: compute test statistic and compare to critical value or example 95% CI
- interpret
p-value
the probability of observing a t-statistic at least as extreme as the one you calculated, assuming your hypothesis is true
one sample t test
compares 1 sample mean to a comparison value of interest
independent samples t-test
-compares samples from two different groups to see if the means are significantly different
paired (or dependent-samples) t-test
-comparing one sample to another related sample to see if there are differences within pairs
one-tailed test
non-directional
two-tailed
directional
Chi-Squared Goodness of Fit
-test if one categorical variable matches the expected distribution
test of independence
-test whether there is an association between two categorical variables
conditions/assumptions of chi-squared
- random sample, independent observations
- sample size: at least 1 of each expected count
- 80% of expected counts should be >5
chi-square
sum (((observed-expected)^2)/expected)
expected count (for chi squared)
(row total x column total)/grand total
df for chi-square of independence
df = (rows-1)(columns-1)
df for chi-squared goodness of fit
df = #groups - 1
odds ratio
- odds of X happening in the presence of Y divided by the odds of x happening without the presence of Y
- (p1/(1-p1))/(p2/(1-p2))
relative risk
- compare the probability of x happening in the presence of y and not in the presence of y
- (a/(a+b))/(c/(c+d))
when the disease is rare
OR = RR
when the disease is common
OR»_space; RR
options for violation of assumptions for OR or RR
- ignore the violation
- transform the day to avoid violations: apply a function y= f(y); neg skewed? try power transforms
- use a nonparametric method
- use a permutation test
detecting violations
- when sample sizes are very small (n<15) it is very difficult
- use nonparametric test
Wilcox-mann-whitney U test
- non-parametric alternative to independent samples t-test
- compares distributions of two groups
- hypothesize, rank all data from least to greater, take sum of ranks for R1
U statistic
U1 = n1n2 + (ni(ni+1))/2 - R1
U2 = n1n2 - U1
U = Max (U1, U2)
-if U > Ucrit, reject null hypothesis