stats Flashcards
assumption of independence
assumption across all inferential tests is that the observations in your sample are independent from each other
measurements for each sample subject are in no way influenced by or related to the measurements of other subjects
pseudoreplication or false independence when these independence aren’t met
you culture bacteria in triplicate to calculate growth rate
- calc the growth rate of each flask and then calc the mean growth rate
p value
the probability under the assumption of no effect or no difference (null hypothesis), of obtaining a result equal to or more extreme than what was actually observed
5% significant difference
effect size
effect size- the degree to which the treatment shifted the observations
when n is small no significance
n is higher is significant
effect sizes are much easier to interpret than p values as are reported in the units of the thing we are measuring
categorical predictors
levels are qualitatively different
we estimate the effect per level
mean value per level
examples:
species
sex
chemical
continuous predictors
levels are numerical
we estimate coefficients of a continuous function (e.g. line, surface)
response variables are usually continuous
slope and intercept
examples:
concentration
mass
time
t-test
single categorical predictor with 2 levels
estimate the mean for the control (wild type)
estimate the mean difference between control and treatment (knock out- wild type )
linear regression
single continuous predictor
estimate the mean yield when fertilizer = 0 (control)
the rate by which yield increases with fertilizer (slope)
one-way ANOVA
single categorical predictor, 5 levels
estimate the mean for the control b,c,d,e
the differences between control and treatment means
multiple regressions
single categorical predictor interacting with single continuous predictor
estimate the y-intercept of species 1
the difference in y-intercept of species 2
the slope of species 1
the difference in slope of species 2
confronting the model with data
- estimate coefficients that prescribe the numerical relationship between predictors and response (parameter estimation) and test whether they could in reality be zero (t-test)
- estimate whether the model explains more variation in the data than expected by chance (ANOVA)
null hypothesis
means what if coefficients were all zero
point of statistical analysis is to use evidence (data) to reject (or not) the null hypothesis
errors
deviations from the expected value
expected value is value that minimizes the deviations (also called errors or residuals)
error sums of squares
the best fit: minimizing error sums of squares
what value of coefficient gives the smallest residuals?
method of ‘least squares’
error = observed = expected
error sum of squares (ESS) = sum (error^2)
sum of errors = zero, by definition
b0= mean (observed) = expected gives smallest ESS
sample coefficient isn’t the ‘true’ parameter
every sample coefficient we estimate is a distribution, not a single number
specifically a t-distribution
width of the curve is defined by the standard error (SE)
the fewer independent points (degrees of freedom) we have, the fuzzier our estimate
T value
coefficient/ standard error
bigger the t value = the bigger the coefficient relative to standard error the better model is and better estimate