after midterm Flashcards
Normal distribution
- shows probability density
- takes 2 parameters the mean and standard deviation (or variance)
- common in nature and shows a lot in sampling
-symmetrical
-about 2/3 of random draws are within one standard deviation of mean - ## about 95% of random draws are within 1.96 (~2) standard deviations of mean
standard normal distribution
- mean is zero
- standard deviation is one
- its table gives the probability of getting a random draw from a standard normal distribution than a given value
How to convert any other normal to standard normal
Z= Y-mean/standard deviation
Y=value interested in
Z tells us how many standard deviations from the mean Y is
When are sample means normal distribution
if the variable itself is normally distributed
standard error of an estimate of a mean
the standard deviation of the distribution of sample means
Central limit theorem
The sum or mean of a large number of measurements randomly sampled for any population is approximately normally distributed, even if variable itself doesn’t have a normal distribution
compares a proportion to some hypothesized value of that proportion of a single categorical variable
Binomial test
About a single categorical variable with more that 2 possible values comparing data about frequencies to some distribution we hypothesis
chai square goodness of fit test
Numerical data comparing a mean of a group to some hypothesized value of that mean
one-sample t-test
comparing numerical groups with meaningful pairing comparing the differences of they pairs mean to some hypothesized value of the mean
paired t-test
compare two numerical groups to ask if they have the same mean
two-sample t-test
comparing the mean of multiple groups of a numerical variable and categorical
single factor ANOVA
asking if there is an association between two numerical variables and if so how strong is it
correlation -calculate r the correlation coefficient
Can we predict Y from X assuming linearity of the relationship
linear regression
weather two categorical variables are independent or associated (ie comparing two or more groups to ask if they have the same proportion of some response variable) with large amount of data
chai square contingency analysis i
Weather two categorical variables are independt or associated ( ie comparing two or more groups to ask if hey have some proportionof some response variable) with small amount of data
Fishers exact test
compare two numerical groups to ask if they have the same mean, allowing for variances to be different
Welchs t-test
Is there a normal distribution within a population or sample?
Shapiro-Wilk test
data from a single sample has a particular median, when normality is not there
sign test, not very powerful (non-parametric)
do two groups have the same distribution? not assuming normality
Mann-Whitney U test
do multiple groups have the same distributon not assuming normality
Kruskal-Wallis test
after you rejected null hypothesis of ANOVA, what pair group has a different mean?
Tukey- Kramer test
two catigorical explanitory variables wlith one numerical response variable. Asking if the first affect the mean does the second affect the mean and is there an interaction between the two catigorical explanitory variables
two-factor ANOVA
two explanatory variables, one is categorical and the other is numerical variable and one response. Asking if the categorical variable influence the response, does the numerical affect the response or is there a relationship between the two explanatory variables on the response
ANCOVA
relationship between two numerical variables without assumption of linearity
Spearman’s correlation
Trying to fit a parabula function to numerical variables
Quadratic regression
comparing two groups to see if they have the same variance for a numerical variable
Levenes test
use a numerical variable to predict a binary response variable
Logistic regression
a method to look for association between variables, hypothesis approach, using the data we already have
permutation
a method to use resampling in a computer to get a confidence interval or estimate, using data we already have
bootstrapping