Part 1 Flashcards
Observation –> … –> … –> …
question, hypothesis, prediction
Observation: Gammarus occurs almost entirely under stones (rather than open streams)
Question: … … Gammarus spend most of its time under stones?
why does
Hypothesis - an … proposed to account for observed facts - there is often more than one hypothesis generated
e.g.
Gammarus occurs under stones because:
- need to shelter from current
- their food gets trapped and accumulates under stones
- they are subject to predation by visually hunting fish and need to remain out of sight
explanation
Predictions - what you would … … … if the hypothesis was true - should be testable and ideally unique to hypothesis it is based on
e.g. shelter hypothesis - a greater proportion of gammarus should be found in the open in streams with slow flow (or slower flowing areas of a stream)
predation hypothesis - gammarus should aggregate under stones more in streams where fish are present than where they are not
expect to see
Hypotheses are … or not …, but rarely …
rejected, rejected, proved
- just bc one hypothesis is supported doesn’t mean there isn’t another underlying explanation - can’t think of all possible hypotheses - with the right evidence we can be sure that hypotheses cannot be true
Cycle of proposing hypotheses and then seeking evidence potentially capable of falsifying them is the scientific process often termed …
falsificationism
A variable is…
any characteristic that can be measured or experimentally controlled on different items or objects
- numeric or non-numeric (e.g. colour)
A set of related variables is known as a … …
data set
Numeric variables can be categorised as belonging to … or … scales
interval, ratio
Categorical variables can be characterised as … or …
nominal, ordinal
Nominal variables…
arise when observations are recorded as categories that have no natural ordering relative to one another, e.g. marital status, sex, colour morph
Ordinal variables…
occur when observations can be assigned some meaningful order, but where the exact ‘distance’ between items is not fixed, or even known, e.g. degree of aggressiveness sorted into the categories: initiates attack (3), aggressive display (2), ignores (1), retreats (0).
Rank orderings are also a type of ordinal data (e.g. place in a race - 1st 2nd 3rd etc.)
- can say something about relationship between categories: larger score = more aggressive response, greater score = slower runner. But cannot say aggressiveness score of 2 is twice as aggressive as a score of 1
Interval scale variables take values on a … numerical scale, but where the scale starts at an … point. e.g. … on a … scale but not on a … scale
consistent, arbitrary, temperature, celsius, Kelvin
- can say difference between 60 and 70 degrees C is the same as that between -20 and -10, but cannot say 60 degrees C is double the temperature of 30 degrees C
Ratio scale variables have a true … and a known consistent mathematical relationship between any points on the measurement scale, e.g. … scale for temperature
zero, kelvin
- on Kelvin scale 60K is double the temperature of 30K
Can meaningfully … or … with interval scales, but cannot meaningfully …, as you can with ratio scales
add, subtract, multiply
In general … variables are the best suited to statistical analysis
ratio
Accuracy is…
how close a measurement is to the true value
Precision is…
how repeatable a measure is, irrespective of whether it is close to the true value
The number of … … we use suggests something about the precision of the result. A value of 12.4 actually measured with the same precision as 12.735 should properly be written …
significant figures, 12.400
Usually the worst form of error is …, a … lack of accuracy
bias, systematic (the data are not just inaccurate but all tend to deviate from the true measurements in the same direction)
E.g.s of bias:
- …-… sampling
- … of biological material
- … by the process of investigation (e.g. adrenaline increased by process of sampling adrenaline in blood)
- … bias
non-random (selective sampling techniques), conditioning, interference, investigator
What does a population mean in statistics?
Any group of items that share certain attributes or properties
The goal of statistics is to learn something about … by … data collected from them
populations, analysing
Statistical populations are defined by the …
investigator
What is a population parameter?
A numeric quantity that describes a particular aspect of the variables in the populations (describes a feature of the distribution of variables in the population) - e.g. population mean, variance, correlation
The sample chosen must be as … as possible of the whole population
representative
A point estimate is useless on its own, as estimates are always derived from a … … of the wider population. They must be accompanied by a value of ….
limited sample, uncertainty
The chance variation that arises in different estimates using different random samples is known as … …
sampling error (or sampling variation)
The sampling distribution is the the distribution we expect a particular estimate to follow
yes
sample size is often denoted as “…”
n
Sampling error is … as sample size is …
reduced, increased
The standard error of an estimate is the … … of its … …
standard deviation, sampling distribution
R doesn’t like …
percentages (use decimals e.g. 0.4 to represent 40%)
… statistics works by asking “what would have happened if we were to repeat an experiment or collection exercise many times, assuming that the … remains the same each time”
Frequentist, population
then working out how likely a particular result is based on the distribution of data
The two most important ideas in frequentist statistics are …-… and … …
p-values, statistical significance
Sampling with replacement: each artificial sample is called a … …
bootstrapped sample
If a probability (p) value is less than the chosen … … we say the result is said to be statistically significant
significance level
The process of assigning random labels is called …
permutation
The p-value is the … of obtaining a test statistic equal to or ‘more extreme’ than the … value, assuming the … hypothesis is true
probability, estimated, null
All frequentist statistical tests work by specifying a … … and then evaluating the observed data to see if they … from the … … in a way that is inconsistent with … variation
null hypothesis, deviate, null hypothesis, sampling
H0 is the … hypothesis and H1 is the … (or …) hypothesis
null, test, alternative
The alternative hypothesis is essentially a statement of the effect we are … … …
expecting to see (e.g. purple and green plants differ in their mean size)
… the null hypothesis is not … the alternative hypothesis
rejecting, proving
Large p value means observed result is quite likely if the null hypothesis is …
true (i.e. due to sampling variation)
- cannot reject null hypothesis (not the same as accepting the null hypothesis is true)
Do not confuse … significance with … significance
statistical, biological
- a result may be statistically significant but biologically trivial, e.g. pH in open water (7.1) vs in beds of submerged vegetation (6.9) is statistically significant but a very small effect and almost certainly of no importance to all the invertebrates.
The significance of a result depends on a combination of three things:
- The size of the true effect in the …
- The … of the data
- The … size
population, variability, sample
We must always evaluate the … of an analysis to determine whether or not we trust it
assumptions
In conceptual terms, the statistical models we use describe data in terms of a … component and a … component
systematic, random
observed data = systematic component + random component
The normal distribution is completely described by its … (a measure of “central …”) and its … … (a measure of dispersion)
mean, tendency, standard deviation
If a variable is normally distributed, then about … of its values will fall inside an interval that is … standard deviations wide
95%, four
The variable name on the left of the ~ must be the variable whose…
mean we want to compare.
The variable on the right must be the indicator variable that says which group each observation belongs to.
Correlations are statistical measures that quantify an … between two … variables
association, numeric
two sample t test - numeric btw categorical variables
A correlation quantifies, via a … …, the degree to which. an association tends to a certain pattern
correlation coefficient
If there is no relationship between the variables, the correlation coefficient will be …. The closer to … the value, the weaker the relationship. A perfect correlation will be either … or …, depending on the direction.
zero, zero, +1, -1
A regression (not a correlation) allows us to make…
predictions about the value of one variable from the value of a second variable
- as a line is fitted through the data
A simple linear regression allows us to predict how one variable (… …) responds to another (… …), using a straight-line relationship
response variable, predictor variable
How do we find line of best fit?
Line with lowest residual sum of squares
residuals are vertical distance from line of best fit
Response variable on … axis, predictor variable on … axis
y, x
Regression model: … variable on the left of the ~, … variable on the right
response, predictor
Larger F values indicate a stronger relationship between…
x and y
ANOVA:
- Measure total variation using sum of squares of deviations from the … …, … variation (within group variation = sum of squares of deviations from individual group means), and between-group variation (sum of squares of deviation of … from the … …)
- Convert to measures of variability that don’t scale with sample size and number of groups (using … … …) - each of 3 sums of squares has different d.f. value
- total d.f, treatment d.f., error d.f.
Then calculate mean square = sum of squares/ degrees of freedom
grand mean, residual, means, grand mean, degrees of freedom
Squaring negative deviations lead to…
a positive number
The important message is that ANOVA works by making just one comparison: the … variation and the … variation
treatment, error
One-way anova does not require … …
equal replication - it will work even where sample sizes differ between treatments
An experimental factor is a controlled variable whose levels are…
set by the experimenter
Anova p-value of lower than 0.05 suggests that…
at least one of the treatments is having an effect - global test of significance as it doesn’t tell us anything about which means are different
Find standard error stuff in…
one-way anova section
Left skew - … data
Right skew - … data
square, log
Independence: value of measurement from one object is not…
affected by the values of other objects
Pseudoreplication is an … increase in the … … (and hence d.f.) caused by using …-… data
artificial, sample size, non-independent
To carry out a t-test on paired data we have to:
- Find the mean … of all the pairs
- evaluate whether this is significantly different from ….
This is actually an application of the …-… …-…
difference, zero, one-sample t-test
In paired t-tests there is no need for the original data to be drawn from a … …. It is the differences between pairs that do
normal distribution
What does RCBD stand for?
Randomised Complete Block Design - each block sees each treatment exactly once
… what you can; … what you cannot
block, randomise
The only thing that distinguishes ANOVA and regressions is the..
type of predictor variable they accommodate (categorical vs numerical)
ANCOVA: residuals generated for: 1. Separate means vs grand mean 2. Common slope vs separate means 3. Separate slopes vs common slope (interaction)
yes
The word “treatment” should be used for … rather than … studies
experimental, observational
chi-squared must be carried out on the actual … not … or …, or the … of data
counts, percentages, proportions, means
A non-parametric test is just a catch-all term that applies to any test which doesn’t assume the data are…
drawn from a specific distribution
Chi-squared tests are …-…
non-parametric - as they make weak assumptions about the frequency data
non-parametric test calculations are done using the … … of the data
rank order
Paired t-test:
Distribution of … does not need to be normal! Only distribution of … does!
samples, differences
if differences not normally distributed - can use wilcoxon test
Mann_Whitney U null hypothesis: … are the same
medians (looking for differing central tendency)
- significant p-value means medians are likely to be different