BIO 330 Flashcards
sampling error imposes
imprecision (accuracy intact)
caused by chance
sampling bias imposes
inaccuracy (precision intact)
accurate sample
unbiased
precise sample
low sampling error
good sample
accurate
precise
random
large
2 types of data
numerical
categorical
numerical data
continuous
discrete
categorical data
nominal
ordinal
types of variable
response
explanatory
response variable
dependent
outcome
Y
explanatory variable
independent
predictor
x
subsamples treated as true replicate
pseudoreplication
subsamples are useful for
increasing precision of estimate for individual samples (multiple samples from same site averaged)
contingency table
explanatory- columns
response- rows
totals of columns and rows
2 data descriptions
central tendency
width
central tendency
mean
median
mode
width (spread)
range standard deviation variance coefficient of variation IQR
effect of outliers on mean
shifts mean towards outliers- sensitive to extremes
median doesn’t shift
sample variance s^2 =
sum( Y_i - Ybar )^2 / n-1
coefficient of variation CV =
100% ( s / Ybar )
high CV
more variability
skewed box plot
left skewed- more data in ‘bottom’- first quartile
right skewed- more data in ‘top’- 3rd quartile
when/why random sample
uniform study area
removes bias in sample selection
when/why systematic sample
detect patterns along gradient- fixed intervals along transect/belt
using quadrats
more better
stop when mean/variance stabilize (asymptote)
what does changing n do to sampling distribution
reduces spread (narrows graph) - increases preciesion
standard error of estimate SE_Ybar =
s / sqr rt (n)
SD vs. SE
SD- spread of distribution/deviation from mean
SE- precisions of an estimate (ex. mean)
95% CI ~=
+/- 2SE
kurtosis
leptokurtic- sharper peak (+)
platykurtic- rounder peak (-)
mesokurtic- normal (0)
Normal distribution, 1SD
~2/3 of the area under the curve (2SD = 95%)
random trial
process/experiment with ≥2 possible outcomes who occurrence can not be predicted
sample space
all possible outcomes
event
any subset of the sample space (≥1 outcome)
mutually exclusive events
P[A and B] = 0
mutually exclusive addition rule
P[7U11] = P[7} + P[11]
general addition rule
P[AUB] = P[A] + P[B] - P[A and B]
multiplication rule
independent events
P[A and B] = P[A] x P[B]
conditional probability
P[A I B] = P[A and B] / P[B]
collection of individual easily available to researcher
sample of convenience
random sample
ever unit has equal opportunity, selection of unit independent, minimizes bias, possible to measure sampling error
problem with sample of convenience
assume unbiased/independent- no guarantee
volunteer bias
health conscious, low income, ill, more time, angry, less prudish
frequency distribution
describes # of times each value of a variable occurs in sample
probability distribution
distribution of variable in whole population
absolute frequency
of times value is observed
relative frequency
proportion of individuals which have that value
experimental studies can
determine cause and effect
*cause
observational studies can
only point to cause
*correlations
quantifying precision
smaller range of values (spread)
determining accuracy
usually can’t- don’t know true value
nominal categorical data with 2 choices
binomial
why aim for numerical data
it can be converted to categorical if need be
species richness
discrete (count)
rates
continuous
large sample
less effected by chance
lower sampling error
lower bias
rounding
round to one decimal place more than measurement (in calculations)
higher CV
more variability
proportions
p^ = # of observations in category of interest/ total # of observations in all categories
sum of squares
it is squared so that each value is +, so they don’t cancel each other out
n-1 to account for population bias
CV used for
relative measures- comparing data sets
sampling distribution
probability distribution of all values for an estimate that we might obtain when we sample a population, centred at true µ
values outside of CI
implausible
how many quadrats to use
till cumulative number of observations asymptotes
law of total probability
P[A] = Σ P[B].P[A I B]
for all B_i ‘s
null distribution
sampling distribution for test statistic, if repeated trials many time and graphed test statistics for H_o
Type I error
P[Reject Ho I Ho true] = alpha
reject null
P-vale < alpha
Type II error
P[do not reject Ho I Ho false]
Power
P[Reject Ho I Ho false]
increases with large n
decreases P[Type II E]
test statistic
used to evaluate whether data are reasonably expected under Ho
p-value
probability of getting data as extreme or more, given Ho is true
statistically significant
data differ from H_o
not necessarily important- depends on magnitude of difference and n
why not reduce alpha
would decrease P[Type I] but increase P[Type II]
continuous probability
P[Y = y] =
0
sampling without replacement
ex. drawing cards
1/52).(1/51).(1/50
Bayes Theorem
P[A I B] = ΣP[B I A].P[A] / P[B]
P-value > alpha
do not reject Ho
data are consistent with Ho
meaning of ‘z’ in standardization
how many sd’s Y is from µ
standardization for sample mean, t =
Ybar - µ / (s / sq.rt. n)
CI on µ
Ybar ± SE.tcrit
SE of Ybar
t of alpha(1 or 2), degrees of freedom
1 sample t-test
compares sample mean from normal pop. to population µ proposed by Ho
why n-1 account for sampling error
last value is not free to vary if mean is a specified value
1 sample t-test assumptions
data are a random sample
variable is normally distributed in pop.
paired t-test assumptions
pairs are a random sample from pop.
paired differences are normally distributed in the pop.
how to tell whether to reject with t-test
if test statistic is further into tails than critical t then reject
2 sample design compares
treatment vs. control
2 sample t-test assumptions
both samples are random samples
variable is normally distributed in each group
standard deviation in two groups ± equal
degrees of freedom
1 sample t-test: n - 1
paired t-test: n - 1
2 sample t-test: n1 + n2 - 2
confounding variables
mask/distort causal relationships btw measured variables
problem w/ observational studies
impossible to differentiate 1 variable
experimental artifacts
bias resulting from experiment, unnatural conditions
problem w/ experimental studies
should try to mimic natural environment
minimum study design requirements
knowledge of initial/natural conditions via preliminary data to ID hypotheses and confounding variables
controls to reduce bias
replication to reduce sampling error
study design process
develop clear statement of research question
list possible outcomes
develop experimental plan
check for design problems
developing a clear statement of research question
ID question, Ho, Ha
choose factors, response variable
what is being testes? will the experiment actually test this?
list possible outcome of experiment
ID sample space
explain how each outcome supports/refutes Ho
consider external risk factors
develop experimental plan
based on step 1
outline different experimental designs
check literature for existing/accepted designs
develop experimental plan based on step 2
what kind of data will you have- aim for numerical
what type of statistical test will you use
minimize bias in experimental plan
control group
randomization
blinding
minimize sampling error in experimental plan
replication
balance
blocking
types of controls
positive
negative
positive control
treatment that should produce obvious, strong effect
ensuring experiment design doesn’t block effect
negative control
subjects go through all same steps but do not receive treatment- no effect
maintaining power with controls
add controls w/o reducing sample size- too many controls samples using up resources will reduce power
placebo effect
improvement in condition from psychological effect
randomization
breaks correlation btw explanatory variable and confounding variables (averages effects of confounding variables)
blinding
conceals from subjects/researchers which treatment was received
prevent conscious/unconscious changes in behaviour
single blind or double blind
better chance of IDing treatment effect if
sample error/noise is minimized
replication =
smaller SE, tighter CI
spacial autocorrelation
each sample is correlated w/ sample area not independent (unless testing differences in that population)
temporal autocorrelation
measurement at one pt in time is directly correlated w/ the one before/after it
balance =
small SE, narrow CI
blocking
accounts for extraneous variation by putting experimental units that are similar into ‘blocks’
only concerned w/ differences within block- differences btw blocks don’t matter
lowers noise
factorial design
most powerful study design
study multiple treatments and their interactions
equal replication of all combinations of treatment
checking for pseudoreplication
check degrees of freedom, very large- problem
overestimate = easier to reject Ho- pretending we have more power than we do
determining sample size, plan for
precision, power, data loss
determining sample size, wanting precision
want low CI
n ~ 8(sigma/uncertainty)^2
uncertainty is 1/2 CI
determining sample size, wanting power
detecting effect/difference plan for probability of rejecting a false Ho n~16(sigma/D)^2 D is min. effect size you want to detect power is 0.8
ethics
avoid trivial experiment
collaborate to streamline efforts
substitute models for live animals when possible
keep encounters brief to reduce stress
most important in experimental study design
check common design problems
sample size (precision,power,data loss)
get a second opinion
most important in observational study design
keep track of confounding variables
good skewness range for normality
[-1,1]
normal quantile plot
QQ plot
compares data w/ standardized value, should follow a straight line
right skew in QQ plot
above line (more positive data)
Shapiro-Wilk test
works like Hypothesis test, Ho: data normal
estimate pop mean and SD using sample data, tests match to normal distribution with same mean and SD
p-value < alpha, reject Ho (don’t want to reject)
testing normality
Histogram
QQ plot
Shapiro-Wilk
normality tests sensitive
especially to outliers, over-rejection rate
sensitive to sample size
large n = more power
testing equal variances
Levene’s test
Levene’s test
Ho: sigma1 = sigma2
difference btw each data point and mean, test difference btw groups in the means of these differences
p-value < alpha reject (don’t want to reject)
how to handle violations of test assumptions
ignore it
transform data
use nonparametric test
use permutation test
when to ignore normality
CLT- n >30 —-means are ~normally distributed
depends on data set though
can’t ignore normality and compare one set skewed left with one skewed right
when to ignore equal variances
n large, n1 ~ n2
3 fold difference in SD usually ok
if can’t ignore violation of equal variances
Welch’s t-test- computes SE and df differently
most common transformations
log, arcsine, square-root
log- only in data all > 0
nonparametrics
assume less about underlying distributions
usually based on rank data
Ho: ranks are same btw groups
sign test (instead of t test)
sign test
compares median to median in Ho
each data pt- record whether above (+) or below (-) the Ho median
if Ho is true in sign test
half data will be above Ho, half will be below
sign test p-value
use binomial distribution– probability of getting your measurement if Ho true, compare to alpha
binomial
P[Y≤y] = Σ(n choose y)(p)^y(1-p)^n-y
Mann-Whitney U-test
compare 2 groups using ranks
doesn’t assume normality
assumes distributions are same shape
rank all data from both groups together, sum ranks for individual groups
Mann-Whitney U-test equation
U1 = n1n2 + [(n1(n1+1)/2] - R1 U2 = n1n2 - U1
interpreting Mann-Whitney U-test
choose larger of U1, U2 (test statistics)- compare to critical U from U distribution (table E)
note that Ucrit = U_alpha,(2 sided), n1, n2
used n1, n2 not DF
U < Ucrit d.n.r. Ho (2 groups not statistically different)
why Mann-Whitney doesn’t use DF
not looking at estimating mean/variance, just comparing the shapes
problem with non-parametrics
low power- P[Type II] higher– especially with low n
ranking data = major info loss
avoid use
Type I not altered
comparing > 2 groups
ANOVA - analysis of variance
Ho: µ1 = µ2 = µ3 = µ4….
why use ANOVA
multiple t-tests to compare >2 groups increase Type I error- more tests = higher chance of falling within alpha
P[Type I]
1 - ( 1 - alpha ) ^N
N is number of t-tests you do
ex. 5 groups- 10 unique tests- P[TI] = 0.4
ANOVA tests
is there more variation btw groups than can be attributed to chance- breaks it down into: total variation, btw group variation, within group variation
maintains P[TI] = alpha
between-group variation
effect of interest (signal)
within-group variation
sampling error (noise)
2x2 ANOVA design
take 2 different variables– look at all combinations and see if any effects between them in all directions
2 variables w/controls = 8 options
Hypothesis test steps
State Ho, Ha
calculate test statistic
determine critical value of null distribution (or P-value)
compare tests statistic to critical value (or P-value to sig. level)
evaluate Ho using alpha
why use alpha = 0.05
balances Type I error and Type II error
why are Type I and II errors conceptual
we don’t know whether or not Ho is actually true
paired t-test is a type of
blocking
where does pseudoreplication happen/become a problem
data analysis stage, doesn’t happen at data collection stage (subsamples)
ANOVA maintains
P[Type I Error] = alpha
ANOVA, Y bar
grand mean, main horizontal line, test for differences between grand mean and group means
ANOVA, Ho: F-ratio =
~1
ANOVA, if Ho is true, MSerror
= MS groups; same variation within and btw goups
ANOVA, MSgroup > MSerror
more variation between groups than within
ANOVA, test statistic
F-distribution, F_0.05,(1),MSgroup DF, MSerror DF = critical value
compare critical value to F-ratio
this is a one sided distribution we are looking for whether F-ratio is bigger than critical value (strictly)
ANOVA, F-ratio > F-critical
Reject Ho.. at least one group mean is different than the others
ANOVA, quantifying variation resulting from “treatment effect”
R^2 = SSgroups/SStotal
R^2 [0,1]
ANOVA, high R^2
more of the variation can be explained by the treatment, usually want at least 0.5
ANOVA, R^2 = 0.43
43% of total variation is explained by differences in treatment
ANOVA, R^2 = low values
noisy data
ANOVA assumptions
Random samples from populations
Variable is normally distributed in each k population
Equal variance in all k populations
ANOVA unmet assumptions
large n, similar variances– ignore
variances very different– transform
non-parametric– Kruskal-Wallis
ANOVA, which group(s) were different
Planned or Unplanned comparison of means
Planned comparisons of means (ANOVA)
comparison between means planned during study design, before data is obtained; for comparing ONE group w/ control (only 2 means); not common
Unplanned comparisons of means (ANOVA)
comparisons to determine differences between all pairs of mean; more common; controls Type I error
Planned comparison calculations (ANOVA)
like a 2-sample t-test test statistic: t =(Ybar1 - Ybar2)/SE SE= √ MSerror (1/n1 + 1/n2) note that we use error mean square instead of pooled variance (as in a normal t-test) df = N-k t critical= t0.05(2), df
Unplanned comparison of means (ANOVA)
Tukey-Kramer
why do you need to know what kind of data you have
determines what kind of statistical test you an do
left skew
mean < median
skew ‘pulls’ mean in direction of skew
C.I. notation
95% CI: a < µ < b (units)
accept null hypothesis
NEVER!!!
only REJECT or FAIL TO REJECT
why do we choose alpha = 0.05
it balances TIE and TIIE which are actually conceptual, since we don’t know if Ho is actually true or not
standard error or estimate
standard deviation of its sampling distribution; measures precision of the estimate
SD vs. SE
SD- SPREAD of a distribution, deviation from mean
SE- PRECISION of an estimate; SD of sampling distribution
test statistics
used to evaluate whether the data is reasonably expected under the Ho
P-value
probability of getting the data, or something more unusual, given Ho is true
reject Ho if
p-value ≤ alpha
less than OR equal to
0.049, 0.05
Steps in hypothesis testing
- State Ho and Ha
- Calculate test statistic
- Determine critical value or P-value
- Compare test statistic to critical value
- Evaluate Ho using sig. level (and interpret)
Type I error
Reject Ho, given Ho true
Type II error
Do not reject Ho, given Ho is false
If we reduce alpha
P[Type I] decreases, P[Type II] increases
Experimental design steps
- Develop clear statement of research question
- List possible outcomes
- Develop experimental plan
- Check for design problems
How to minimize bias
control group, randomization, blinding
How to minimize sampling error
replication- lare n lowers noise
balance- lowers noise
blocking
to avoid pseudoreplication
check df- obviously if its huge something is wrong
Tukey-Kramer
for 3 means: three Y bars, three Ho’s; Q distribution; 3 row table w/ group i, group y, difference in means, SE, test statistic, critical q, outcome (reject/do not)
Q-distribution
symmetrical, uses larger critical values to restrict Type I error; more difficult to reject null
Tukey-Kramer test statistic
q = Y_i(bar) - Y_j(bar) / SE SE = √ MSerror(1/n1 + 1/n2)
Tukey-Kramer testing
test statistic, q-value
critical value, q_α,k,N-k
k = # groups
N = total # observations
Tukey-Kramer assumptions
random samples
data normally distributed in each group
equal variances in all groups
2 Factor ANOVA
2 Factors = 3 Ho’s: difference in 1 factor, difference in 2nd factor, difference in interaction
If interaction is significant
do not conclude that factor is not
Interaction plots
y-axis: response variable
x-axis: one of 2 main factors
legend for: other of 2 main factors (different symbols or colors)
2 lines
interpreting interaction plot, interaction
lines parallel: no significance in interaction
interpreting interaction plot, b (data not on x-axis)
take average along each line and compare the 2 on the y-axis, if they are not close then they are significant
interpreting interaction plot, a (data on x-axis)
x-axis: take average between the 2 dots (for each level of a), compare on y-axis, if they are not close they are significant
control groups in an observational/experimental study will
reduce bias
will not affect sampling error
correlation ≠
causation
correlation
“r”- comparing 2 numerical variables, [-1,1], no units, always linear
quantify strength and direction of LINEAR relationship (+/-)
how to calculate correlation
r = signal/noise signal= deviation in x and y together for every point (multiply each deviation before summing)
correlation Ho
no correlation between interbreeding and number of pup surviving their first winter (ρ = 0)
determining correlation
test statistic: r/SE_r SE_r = √ (1-r^2) / (n-2) df = n-2 critical: tα,(2),df compare statistic w/ critical
df
n - number of parameters you estimate
correlation- you estimate 2
mann whitney- 0 parameters
stating correlation results
be careful not to interpret– no causation!
understanding r
easy to understand because of lack of units, however, can trick you into thinking comparable across studies- across studies need to limit ranges
Attenuation bias
if x or y are measured with error, r will be lower; with increasing error, r is underestimated; avoided by taking means of subsamples
correlation and significance
statistically sig. relationships can be weak, moderate, strong
sig.– probability, if Ho is true
correlation– direction, strength of linear relationship
weak, moderate, strong correlation
r = ±0.2 –weak r = ±0.5 – moderate r = ±0.8 – strong
correlation assumptions
bivariate normality- x and y are normal
relationship is linear
dealing with assumption violations (correlation)
histograms
transformations in one or both variables
remove outlier
outlier removal
–need justification (i.e. data error)
–carefully consider if variation is natural
–conduct analyses w/ and w/o outlier to assess effect of removal
natural variation, outliers
is your n big enough to detect if that is natural variation in the data
if outlier removal has no effect
may as well leave it in!
non-parametric Correlation
Spearman’s rank correlation; strength and direction of linear association btw ranks of 2 variables; useful for outlier data
Spearman’s rank correlation assumptions
random sampling
linear relationship between ranks
Spearman’s rank correlation
r_s: same structure as Pearson’s correlation but based on ranks
r_s = [Σ(Ri-Rbar)(Si-Sbar)] / [ Σ(Ri-Rbar)^2Σ(Si-Sbar)^2 ]
conducting Spearmans
rank x and y values separately; each data point will have 2 ranks; sum ranks for each variable; n = # data pts.; divide each rank sum by n to get Rbar and Sbar; calculate r_s (statistic); calculate critical r_s(0.05,df)
if 2 points have same rank (Spearman)
average of that rank and skip rank before/after; w/o any ties, the 2 values on the bottom of r_s equation will be the same
Spearman hypothesis
ρ_s = 0, correlation = 0
Spearman df
df = n because no estimations are being made in ranking
linear regression
–relationship between x and y described by a line
–line can predict y from
–line indicates rate of change of y with x
Y = a + bX
correlation vs. regression
regression assumes x,y relationship can be described by a line that predicts y from x
corr. - is there a relationship
reg. - can we predict y from x
perfect correlation
r = 1, all points are exactly on the line– regression line fitted to that ‘line’ could be the exact same line for a non-perfect correlation
rounding mean results
DO NOT; 4.5 puppies is a valid answer
best line of fit
minimizes SS = least squares regression; smaller sum of square deviations
used for evaluating fit of the line to the data
residuals
residuals
difference between actual Y value and predicted values for Y (the line); measure scatter above/below the line
calculating linear regression
calculate slope using b = formula; find a– a = Ybar - bXbar; plug in to Ybar = a + bXbar; rewrite as Y = a + bX; rewrite using words
Yhat
predicted value- if you are trying to predict a y value after equation has been solved
why do we solve linear regression with Xbar, Ybar
line of fit always goes through Xbar, Ybar
how good is line of fit
MSresiduals = Σ(Yi - Yhat)^2 / n-2
which is SSresidual / n-2
quantifies fit of line- smaller is better
Prediction confidence, linear regression
precision of predicted mean Y for a given X
precision of predicted single Y for a given X
Precision of predicted mean Y for a given X, linear regression
narrowest near mean of X, and flare outward from there; confidence band– most confident in prediction about the mean
precision of predicted single Y for a given X, linear regression
much wider because predicting a single Y from X is more uncertain than predicting the mean Y for that X
extrapolating linear regression
DO NOT extrapolate beyond data, can’t assume relationship continues to be linear
linear regression Ho
Slope is zero (β = 0), number of dees cannot be predicted from predator mass
linear regression Ha
slope is not zero (β ≠ 0), number of dees can be predicted from predator mass (2 sided)
Hypothesis testing of linear regression
testing about the slope:
–t-test approach
–ANOVA approac
Putting linear regression into words
Dee rate = 3.4 - 1.04(predator mass)
Number of dees decreases by about 1 pre kilo of predator mass increase
testing about the slope, t-test approach
test statistic t = b–β_o / SE_b SE_b = √MSresidual/Σ(Xi-Xbar)^2 MSres. = Σ(Yi-Yhat)^2 / n-2 critical t = t_α(2),df df = n - 2 compare statistic, critical
testing about the slope, ANOVA approach
source of variation: regression, residual, total
sum of squares, df, mean squares, F-ratio
calculating testing about the slope, ANOVA approach
SSregres = Σ(Yi^ - Ybar)^2 SSresid. = Σ(Yi-Yi^)^2 MSreg. = SSreg/df df=1 MSresid = SSres/df df=n-2 F-ratio = MSreg/MSres. SStotal = Σ(Yi-Ybar)^2 df total = n-1
interpreting ANOVA approach to linear regression
If Ho is true, MSreg. = MSres
% of variation in Y explained by X
R^2 = SSreg/SStotal
a% of variation in Y can be predicted by X
Outliers, linear regression
create non-nomral Y-value distribution, violate assumption of equal variance in Y, strong effect on slope and intercept; try not to transform data
linear regression assumptions
linear relationship
normality of Y at each X
variance of Y same for every X
random sampling of Y’s
detecting non-linearity
look at the scatter plot, look at residual plot
checking residuals
should be symmetric above/below zero
should be more points close line (0) than far
equal variance at all values of x
non-linear regression
when relationship is not linear, transformations don’t work, many options- aim for simplicity
quadratic curves
Y = a + bX + cX^2
when c is negative, curve is humped
when c is positive, curve is u shaped
multiple explanatory variables
improve detection of treatment effects
investigate effects of ≥2 treatments + interactions
adjust for confounding variables when comparing ≥2 groups
GLM
general linear model; multiple explanatory variables can be included (even categorical); response variable (Y) = linear model + error
least-squares regression GLM
Y = a + bX error = residuals
single-factor ANOVA GLM
Y = µ + A error = variability within groups µ = grand mean
GLM hypotheses
Ho: response = constant; response is same among treatments
Ha: response = constant + explanatory variable
constant
constant = intercept or grand mean
variable
variable = variable x coefficient
ANOVA results, GLM
source of variation: Companion, Residual, Total
SS, df, MS, F, P
ANOVA, GLM F-ratio
MScomp. / MSres.
ANOVA, GLM R^2
R^2 = SScom. / SStot.
% of variation that is explained
ANOVA, GLM, reject Ho
Model with treatment variable fits the data better than the null model but only 25% of the variation is explained
Multiple explanatory variables, goals
improve detection of treatment effects
adjust for effects of confounding variables
investigate multiple variables and their interaction
design feature for improving detection of treatment effects
blocking
design feature for adjusting for effects of confounding variables
covariates
design feature for investigating multiple variables and their interaction
factorial design
experiments with blocking
account for extraneous variation by putting experimental units into blocks that share common features
ex. instead of comparing randomly dispersed diversity, look at response variable within a block
GLM, blocking
Ho: mean prey diversity is same in every fish abundance treatment
Ho: Diversity = grand mean + block
Ha: mean prey diversity is not the same in every fish abundance treatment
Ha: diversity = grand mean + block + fish abundance
ANOVA, GLM, blocking
source of var.: block, abundance, residual, total
SS, df, MS, F, P
Blocking Ho
Ho: mean prey diversity is the same in each block
Ha: mean prey diversity is not the same in each block
Block R^2 = SSblock / SStotal
Abundance + block R^2 =
SSabun. + SSblock / SStotal
block as a variable
block is an explanatory variable even if we are not inherently interested in its effect b/c it contributes to variation
covariates
reduce confounding variables, reduce bias
ANCOVA, GLM
Response = constant + explanatory + covariate
ANOCVA hypotheses
Ho:No interaction between caste and body mass
Response = constant + exp. + covariate
Ha: Interaction between caste and body mass
Response = cons. + exp + cov. + explanatory*covariate
ANCOVA hypotheses graphs
Ho: parallel
Ha: not parallel
affect is measured as the vertical difference between the two lines
Testing ANCOVA
are the slopes equal
if not significant, drop interaction term and run model again
df of interaction =
df_covariant * df_explanatory
Factorial design
multiple explanatory variables
fully factorial- every level of every variable and interaction is studied
Factorial GLM statements
Ha: algal cover = grand mean + herbivory + height + herbivory*height
Ho: a.c. = G.M. + Herb. + Height
GLM null hypotheses
do not include interaction statements
always one term different from alternative
GLM degrees of freedom
explanatory: df = levels of treatment - 1 interaction: df = df_exp.1 * df_exp.2 df always total to grand n - 1
Factorial GLM hypotheses graphs
Ho: no interaction = parallel lines
Ha: interaction = non parallel, maybe crossing lines
Probability of independent events
P[X] = P[A]P[B]P[C]*….
if multiple ways to arrive at P[X] then add them up, or use Binomial (if conditions met)
Binomial distribution
probability distribution for # of successes in a fixed n of independent trials
Binomial conditions
independent
probability of success is same for each trial
2 possible outcomes- success/failure
proportion equations
p^ = X/n SE_p^ = √ [p^ (1-p^)] / [n–1]
Binomial test, testing proportions
whether relative frequency of successes in a population matches null expectation
Ho: p = p_o
law of large numbers
higher n = better estimate of p (or any estimate for that matter), lower SE
binomial testing proportions calculations
test statistic = observed number of successes
null expectation = null ‘p’ * number of ‘trials’ (weighted by trials)
steps in finding binomial p-value
use null ‘p’ in binomial to calculate observed successes + anything more extreme; multiply by 2 (2 sided test)- this is the p-value; not comparing to critical value; compare to alpha
binomial, p < 0.001
reject Ho, p^ is significantly different than Ho: p = under a proportional model
95% CI for a population parameter
p’ = ( X + 2 ) / ( n + 4 )
p’ ± Z √ [p’ (1–p’)] / [n+4]
Z = 1.96 for 95% CI
> 2 possible categories
X^2 goodness-of-fit test
compare frequency data w/ >2 possible outcomes to frequencies expected from probability model in Ho
Bar graphs
categorical data
space between bars
X^2 example (days)
Ho: # of births is the same on each day
births on Monday is proportional to # of Mondays in the year
X^2
test statistic measures discrepancy btw observed (data) and expected (Ho) frequencies
X^2 calculations
find E for each group, then X^2 for each group, sum X^2 = test statistic, compare to critical value E = n*p X^2 = Σ (O – E)^2 / E df = # categories – 1 critical X^2_α,df
Sampling distribution for Ho, binomial
Histogram- sampling distribution for all possible values for X^2
black line- theoretical X^2 probability distribution
higher X^2 values
observed farther from expected
X^2, why -1 in df
using n to calculate expected value- restricts data
X^2 reject Ho
data do not fit a proportional model, births are not equally distributed through the week
X^2 goodness-of-fit assumptions
random sample
no category has expected frequency > 1
no more than 20% of the categories have expected frequencies < 5
Poisson distribution
describes probability of success in a block of time or space, when successes happen independently and with equal probability
distribution of points in space
clumped
random
dispersed
Poisson, P[X successes] =
E = e^-µ . µ^x / X! µ = mean # of independent successes
Poisson hypotheses
Ho: number of extinctions per time interval has a Poisson distribution
Ha: number of extinctions do not follow a Poisson distribution
calculate a mean from a frequency table
µ = (n1f1)+(n2f2)+(n3*f3)+…. / n
hypothesis testing, poisson
calculate probability of success (expected value) for each level; calculate X^2 for each level, sum them; compare to critical value
df = # categories - 1
determining if data are clumped or dispersed
s^2 =
[ Σ (Xi - µ)^2 * (obs. frequency)] / (n–1)
clumped: s^2 > µ
dispersed: s^2 < µ
X^2 used for
proportional
binomial
poisson
rejecting Ho, binomial
probability of success is not same in all trials or trials are not independent
rejecting Ho, poisson
successes are not independent, probability of success is not constant over time or space
contingency analysis
whether one variable depends on the other (is contingent on) in a contingency table explanatory variable in columns response variable in row each subject appears in table once
contingency Ho
no relationship between variables, variables independent
associating categorical variables
test for association between ≥2 categorical variables
are categorical variables independent
odds ratio
X^2 contingency test
odds ratio
to measure magnitude of association between 2 variables when each has only 2 categories
odds: O^ = p^ / 1–p^
odds ratio: OR = O1^ / O2^
X^2 contingency test
to test whether the 2 variables are independent; to test association between 2 categorical variables; need expected frequencies for each cell under Ho
OR =
OR=1 : odds same for both groups
OR>1 : odds higher in 1st group- associated with increased risk
expected frequencies, X^2 contingency
P[A ∩ B] =
(row total / grand total)(column total / grand total)
E = P[A ∩ B] * grand total
calculating X^2 contingency
X^2 = Σ (O–E)^2 / E = test stat
df = (#rows–1)(#columns–1)
compare to critical value
rejecting Ho, contingency
Reject Ho that A and B are independent; P[A] is contingent upon B
X^2 contingency test assumptions
random sample
no cells can have expected frequency <5
if X^2 contingency test assumptions not met
≥2 rows/columns can be combined for larger expected frequencies
to test independence of 2 categorical variables when expected frequencies are low
Fisher’s exact test
Fisher’s exact test
gives exact p-value for a test of association in a 2x2 table
Fisher’s exact test assumptions
random samples
Fisher’s Ho
state of A and B are independent
conduct Fisher’s
–list all possible 2x2 tables w/ results as or more extreme than observed table
–p-value is sum of the Pr of all extreme tables under Ho of independence
–assess null
Computer-Intensive methods
cheap speed
hypothesis testing- simulation, permutation (randomization)
standard errors, CI- bootstrapping
hypothesis testing, simulation
–simulates sampling process many times- generate null distribution from simulated data
–creates a ‘population’ w/ parameter values specified by Ho
–used commonly when null distr. unknown
simulation to generate null distribution
- create and sample imaginary population w/ parameter values as specified by Ho
- calculate test statistic on simulated sample
- repeat 1&2 large number of times
- gather all simulated test statistic values to form null distr.
- compare test statistic from data to null distr. to approx. p-value and assess Ho
generated null distribution
P-value ~ fraction of simulated X^2 values ≥ observed X^2
none ≥ observed, P < 0.0001
Permutation tests (Randomization test)
test hypotheses of association between 2 variables; randomization done w/o replacement; needs ‘parameter’ for association btw 2 variables
Permutation test used when
assumption of other methods are not met or null distribution is unknown
Permutation steps
- Create permuted data set w/ response variable randomly shuffled w/o replacement
- calculate measure of association for permuted sample
- repeat 1&2 large number of times
- Gather all permuted values of test statistic to form null distribution
- Determine approximate P-value and assess Ho
Bootstrapping
calculate SE or CI for parameter estimate
useful if no formula or if distribution unknown
randomly ‘resamples’ from the data with replacement to estimate SE or CI
ex. median
bootstrapping steps
- random sample w/ replacement- 1st bootstrap sample
- calculate estimate using bootstrap sample
- repeat many times
- calculate bootstrap SE
* only sampling from original sample values
simulation
mimics repeated sampling under Ho
permutation
randomly reassigns observed values for one of two variables
bootstrapping
used to calculate SE by resampling from the data set
Jack-knifing
leave-one-out method for calculating SE
Jack-knifing
gives same result every time (unlike boot strapping)
calculates mean from n-1, then n-2, then n-3
statistical significance
observed difference (effect) are not likely due to random chance
practical significance
is the difference (effect) large enough to be important or of value in a practical sense
effect size
ES– degree or strength of effect
ex. magnitude of relationship btw 2 variables
3 ways to quantify
3 ways to quantify ES
standardized mean difference
correlation
odds-ratio
standardized ean difference
Cohen’s d
can find statistical significance
with a large n, which may not be large effect size, and may not be significant at lower n
Quantifying ES
2% difference btw population and sample means
difficult to interpret mean differences w/o accounting for variance (s^2)
Cohen standardized ES w/ variance
Cohen’s d
simplest measure of ES
difference btw means / Sp
standardizes, puts all results on same scale (makes meta-analysis possible)
Meta-analysis
analysis of analysis
synthesis of multiple studies on a topic that gives an overall conclusion; increases sig. of individual studies (larger n)
black line = 1-1 line - no difference, no more, no less
steps in meta-anlysis
define question to create one large study- general or specific; review literature to collect all studies- exhaustively; compute effect sizes and mean ES across al studies; look for effects of study quality
literature search
beware of ‘garbage in, garbage out’, publication bias, file-drawer problem
publication bias
bias- studies that weren’t published- lower n, insignificant, low effect
garbage in, garbage out
justify why studies are not included, what is considered poor science?
file-drawer problem
studies that are not published- grad thesis, government research
look for effects of study quality, Meta-analysis
do differences in n or methodology matter
- correlation btw n and ES?
- difference in observ. and exp. studies?
- base meta-analysis on higher quality studies
pros of Meta-analysis
tells overall strength & variability of effect
can increase statistical power, reduce Type II error
can reveal publication bias
can reveal associations btw study type and study outcome
cons/challenges of meta-analysis
assumes studies are directly comparable and unbiased samples
limited to accessible studies including necessary summary data
may have higher Type I error if publication bias is present
what do we get out of the statistical process
a probability statement
this process is called Frequentist statistics, most commonly used
What does frequentist statistics do
- answer probability statements if/given the null is true
- infer properties of a population using samples
- doesn’t tell if null is true, not proof of anything
- useful, but must understand so not overinterpreted
frequentists statistics developed
Cohen, 1994; Null Hypothesis Sifnificance Testing
why use frequentist statistics
appears to be objective and exact readily available and easily used everyone else uses it scientists are taught to use it supervisors & journals require it
limits of frequentist statistics
–provides binary info only: significant or not
–does not provide means for assessing relative strength of support for alternate hypotheses
–failing to reject Ho does not mean Ho is true
–does not answer real question
does not provide means for assessing relative strength of support for alternate hypotheses
ex. conclude the slope of the line is not 0, how strong is the evidence that the slope is 0.4 vs 0.5
real question
whether scientific hypothesis is true or false
- treatment has an effect (however small)
- if so, then Ho of no effect is false, but we are unable to show that Ho is false (or true)
- we can only show the probability of getting the data, if Ho is true
question we CAN answer
about the data, not the hypothesis- given the data, how likely is Ho to be true
more limitations for frequentist stats
whether a result is significant depends on n, ES, alpha
significant does not always mean important
larger n, ES, alpha
increase likelihood of rejecting Ho- getting significant result
significant does not necessarily mean important
effects can be tiny and still statistically significant
focus on p-values and Ho rejection
distracts from the real goal- deciding whether data support scientific hypotheses and are practically/biologically important
mostly we should be interested in
size/strength/direction of an effect
Bayesian statistics
incorporate beliefs or knowledge of parameter values into analyses to contain population estimate
frequentists vs. bayesian example
100 coin flips all give 95 heads, what is the probability that the next flip will be a head?
freq. - 50%
bay. - 95%