part one Flashcards
what are the two types of variability
- intrinsic (natural system)
2. extrinsic (measurement error)
when do you define the population
population must be defined before the sampling proess has begun as it will dictate how you sample
how are frequency curves characterised
by 2 key parameters: location (eg middle, mean, mode) and dispersion (spread, variance)
why are the parameters of the frequency curve important
we can never know the true population parameters therefore we can infer from sampling
what is mu (μ)
population mean
what is x
sample mean
what is σ
population standard deviation
what is s
sample standard deviation
what is SEM
standard error of mean measure variabilty of the sample mean
what are the 6 main steps of the logical framework ?
- observations
- models
- hypothesis
- null hypothesis
- experiment and sampling
- interpretation and results
what is the next step after you retain the null
you refute model and hypothesis. therefore you go back to observations and find out what was missing
what is the next step after you reject the null
you retain model and hypothesis. you dont stop. you ask why is this the case ie what are the mechanisms that make this model true?
what are the 2 types of observations
- casual (personally seen in nature with no prior knowledge
2. previously quantified in literature
what types of phrases must be used when making casual observations
it appears, it seems, it looks like
ie not certain
what is a model
the reason behind observations used to explain process
how do you state model from a casual observation
it is correct because it happens in nature in this location where i saw it
how do you state a model from a quanitifed observation
literature behind process eg this is because…
what is a hypothesis
what you predict if the model is true.
use the structure… if I do this then i will observe this/i will expect this
what is the difference between mensurative and manipulative
mensurative experiments are observational, they do not change the experiment.
manipulative experiments change system to understand patterns (you need literature first)
what is the point of a null hypothesis and what is this approach called
falsificationist approach. the hypothesis can never been proved because the population cant be measured. therefore you test everything outside of the population and what remains is true.
limited by its design, a mensurative study can only give certain interpretations. what are they
correlative not causational
it doesnt let you understand cause and effect or mechanisims. merely descriptive/qualittative
what is required for appropriate manipulative studies
appropriate controls and adeuqate prior biological knowlegde of the system
what is the difference between precision and accuracy
precision is the measure of spread, (precise = narrow, imprecise = wide) you can test using standard error of mean
accuracy is the measure of how close the sample mean is to the population mean (usually you cannot test accuracy
SEM
s/sqrt n
when should you use random sampling
when information is not known about the population
when should you use statified sampling
when you know information about the population to best represent that population . this increases precision and accuracy
is random sampling always represenative
no
by chance it can be or it can not be. therefore preliminary tests can be performed with lots of replicates to decide how to represenativly sample
assumptions that must be accounted for PRE sampling
independance
randomness
these are KEY
assumptions that are analysed POST sampling
homogeneity of variances
normality of residuals
how to ensure independent data
replicates need to be indepenent of each other (eg seperated through space, look for possible relationships between replicates
what is psuedo replication
‘replicates’ that are non-independant on each other therefore not really true replicates as you are not accounting for relationships between individuals. this increases type 1 error
what is confounding
when you reject the null (ie your hypothesis is supported) however this is not because your model is correct rather you have not accounted for other factors/variables that cause this relationship
how to mitigate confounding effects
by performing a manipulative study where you can control the variables
why do we perform statistical tests
as we are taking a sample of the population that is subject to error, we can only make probalistic statments rather than absolute statments. statistcis allows us to quantify
what are the three components of a statistical test
a null hypothesis
a test statistic
rejection region and critical value
what is the logical null
everything not included in hypothesis (eg equal or opposite)
what is the statistical null
there is no difference between groups
what is the t-test
testing the difference between 2 means
when do you use 2 tailed t test
when there is no direction in your hypothesis eg (there is no specified direction for proposed difference
when do you use a 1 tailed t test
when you have a directional hypothesis (eg this pop is greater than this pop)
what is a type 1 error
when the null is true however you reject it
what is a type 2 error
when the null is false however you support it
how can you control type 1 error
critical value (eg alpha = 0.05)
why are the rejectios regions smaller for 2 tailed t tests
probability is always alpha (eg 0.05) so when you have a 2 tailed you half alpha (eg 0.05/2)
why is the assumption of homogeneity important
if variances are not equal then the rejections regions will not be comparable across groups this increases type 1 error
to reduce: large sample and balance n
can be fixed by transforming
what is a residual
difference between data point and predicted value (ie mean)
why is normaility usually not important
central limit therom ensures normality, therefore in a large enough sample it is not necessary
can be fixed post sample by transforming. normality is only important in really skewed/non normal data.
what is anova
analysis of variance. looking at variation between more than 2 groups and within more than 2 groups
why not use a t tests on more than 2 groups
increases probability of type 1 error (raises from 0.05 * x amount of tests conducted.) correction can be used but this increases type 2 error and reduces power
what is a factor
ie treatment/group.
you have factors and levels within factor
what is the linear model in descriptive terms
mean + effect of factor + noise
what is the null testing in an anova
that there is no effect ie the levels of a factor dont differ. therefore the MS ratio = 1
what is the alt testing in an anova
that there is an effect between ie between levels of a factor. the MS ratio is >1
are covariates of an anova categorical or continuous. why is this unquie?
categorical ie factors. it is a linear model
when conducting a test in R for homogeneity, how do you read the output ?
the null is that there are no difference (ie they are homogenius (what we want). therefore if NOT significant the variances are not different therefore they are homogenious
what is a one way anova?
1 factor with mutiple levels (comparing between levels)
what is a 2 way anova
2 factors with mutiple levels (comparing between fatcors and between levels)
when conducting a t test or anova and you get a very small p value what does this mean?
this does not test the magnitiude of signfiance just that is it indeed significant. to look at th emagnitiude look at the data not the p value
what is a post hoc test?
tests conducted after getting a significant p value in an anova to determine which levels are significant (more than 2)
how to read output of SNK post hoc test in R
look at ranks given to each level to know what one they are comparing. then look at comparisons (ie 2-1) and look if stars to see significant. to see which is bigger look at rank means.
what is a correlation
testing for a relationship / assoication between 2 random variables
what is a regression
testing whether a response variable is caused by explanitory variable
ie prediction
what is the difference between correlation and regression
correlation must be conducted primarily to know there is an association /pattern between variables and test the strength of that relationship. after this is established you can test for prediction (ie one causes the other)
are covariates of regression/correlation categorical or continuous.
continuous. it is a Linear model
what is needed when sampling for correlation/regression
each unit in the population has a value for each variable
what is the statistic used for a linear correlation
p = pearsons. ranges from-1 to +1 (ie perfect negativce or perfect positive). 0 = no relationship
what is the r/ cor value
correlation r value. not the same as r2. measures the strentgh of the relationship -1 to +1. 0= no relationship.
besides random sampling and independence what are the other assumptions for linear correlation
normall distributed variables and relationships between variables are linear
if a regression is shown between 2 variables, what are your limits with prediction
you can predict new values of Y ( response) from new values of X (explanitory) however this is ONLY within your sampling range and you cant go beyond that.
in the linear model, what is the slope
the relationship. if slope is at 0 there is no relationship.
what is the pearsons r2
testing the precision of prediction. how much of the variation in y is explained by x. (closer to 1 = stronger). if below 0.5 = a lot of variance is not explained pruly by relationship with x
what is the null for a regression
slope. either slope is at 0 (no relationship) or it is directional (opposite to your alternative).
what are the steps to completeing a regression test
- plot a scatter plot to see linear relationship
- perform anova and look at p value
- if positive look at r sqaured to test strength
what is a big assumption for linear regression
fixed X.
measured without error. as all measurement has error however, the error must be lower than the measurement. ie measurement of cm with mm error okay )
what is the difference between correlation and casuation
correlation is an association between 2 variables, just because there is a relationship it doesnt mean one causes the other.
casuation means one variable is caused by the other.
how do you determine causality
you can only disprove nulls therefore you can never prove casuality
however to infer casuality you need to perform maipluatlve experiments
what is a scale consideration when creating experiments
you usually work with smaller scales, will the same relationship be found at larger scales?
what is a procedural control
another level of a factor to account for experiment artefacts, ie an effect you created with your experiment that perviously wasnt in the system (confounding). you need a treatment, a control and a proceedural control
categorical covariates aka explanitory variables can be defined as..
factors:
fixed vs random
crossed vs nested
can you have interaction witha 1 way anova
no. interactions mean that levels of one factor is dependent on levels of another factor. you can only have interactions in a 2 or more way anova
what is a nested design, give an example
levels factor 1 are nested within levels of factor 2
a common one is location:
ie. treatments for factor 1 are sperated between sites
what is a crossed design
all levels of factor 1 are present within the other factor
ie all treatments are present within each site
in crossed or nested can you find interactions
only in corssed designs can you find interactions
what is an interaction
levels of one factor are dependent on the levels of another factor
ie whether the levels of factor one are significant will depend on which site (level) of factor 2 they are in. this means there is inconsistancy thorugh space.
in an anova table how do you know if there is an interaction?
look at the bottom
factor1:factor2 and the pvalue
what do you do if there is an interaction
post hoc tests. eg SNK test will look at each factor2 and the levels of factor 1 wihtin this factor 2. eg each site and the levels with sites and if they are significant then look at means
say you dont have homogenous variances but you perform a test anyway when is this an issue ?
if you dont get a significant effect then there is no issue however if you get a significant effect then need to transform and test again because nonhomogeneity increases type 1 error (say there is when there isnt)
will you need to do post hoc if you do a 1 factor anova with no interaction?
yes, if there are more than 2 levels to know which levels are different.
what happens if there is no significant interaction
look at main effects ie your response variable to look at differences betwene levels
what is a mixed model
a combination of fixed factors and random factors
what is a random factor
fixed: treatment, specific.
random: general, represenative
example:
fixed sites you can about each site
random sites you test for consistancy spaitally
why is the difference between fixed and random factors important
it will change how the mean square is estimated.
fixed cares about means between levels. whereas random cares only about variability between sites
same for the null: are you looking at means (fixed) or variance (random)
the extend of the interenced for fixed vs random factors
you cannot extrabolate/generalise for fixed factors. what you get from your experiment is speciic to your factors.
for random, it is more general and inferences can be applied to other spp/sites etc.
why have mixed models?
avoid confounding (spaital or temporal), avoid non independence, test for consistency
can you pool random sites togther to increase sample sizes?
no. this increases type 1 error
are random factors continuous or categorical
always categorical
are chi tests for categorical or continuous variables?
categorical
what are the two ways chi tests can be used
- goofness of fit, whether the sample matches exptected population
- contingency or assoication test. test for independence
how to calculate the degrees of freedom for a chi test
(rows - 1)*(columns -1)
how to you generate expected cell counts for chi tests
using the null hypothesis and the total samples
what is the chi test statistic formula
[sum of] (observed - expected)2/expected
what is a key assumption of the chi test goofness of fit
no more than 20% of expected freuqnecies are smaller than 5. there can be transformations / pooling if so
how is a test for independence similar and different from a correlation
both testing for associations between random variables however chi is categorical while correlation is continuous
what is the null for a chi 2 independence test
no association
are there post hoc tests for chi 2
no, only way to know is to plot the data on graphs
what are parametric tests
make assumptions about the parameters ie means, variance between groups/treatments of a populations distrubution
eg t test ANOVA linear regression
what are non parameteric tests
distribution free, not estimating parameters ie rank based tests
what is a rank based non parameteric test
no assumption about underlying disturbition therefore good when you have very non normal data or big outliers you want to keep or the responses are already ranks
assumptions of non parameteric tests
independence between samples and homogeneity of variance
how to perform a non parameteric rank test
rank all observations ignoring groups from low to high
randomise ranks to develop propability distribution. use real data to see if it fits distrubution
what is better a parametric test or non parameteric test
always use parametric if you can, it is more powerful. try and trnaform data is non normal first. best to use non parameteric test if your data is already ranks
what is the mann whitney wilcoxon test
non parameteric test to comapre 1 factor with 2 levels (similar to a t test )
when to use kruskal wallis test
and extension of MWW factor with mroe than 2 levels (similar to an anova)
when to use spearman rank correlation
for non linear correlations, continuous
when ranking data what do you do with tied observations with the same value
average of the ranks
what is rho
rnak coefficent for spearmans correlation. a measure of strength of the relationship -1 to 1.
to get independent data when sampling. what is the best method
randomly take 1 sample per treatment in the same block
or take all samples for treatment in the same block and use the mean only
cannot use all samples as that is pseudo replication