stats Flashcards
tables
enables the reader to quickly and easily find the actual data values
a disadvantage of tables
do not lend themselves to composing a mental picture of the trends occuring
graphs
show trends or patterns that are easily visualized
line graphs are useful for
demonstrating principles
independent variable
the horizontal axis
dependent variable
the vertical axis
figures
graphs, maps, photographs, and other illustrations
frequency distribution
summarize data
range
interval between the largest and smallest values
what classes are frequency distributions divided into
classes of equal size, and records class frequency
histograms
useful diagrams to plot frequency distributions
special type of bar graph in which each vertical bar represents an interval of values
vertical or horizontal lines are allowed
partial horizontal
regression
minimizes the subjectivity in determining relationships between variables
plot B as a function of A
B will be on the y
A will be on the x
exception for A going on the y
depth, height, altitude
bar graphs
wish to summarize quantities in different categories
stacked bar graph
sections of each bar are stacked on top of each other
mean
the average of all observations
Xi
value of the ith observation
N
totally number of observations
variance
measure of variability of spread in the data
formulation of hypothesis steps
- form null and alternative hypothesis
- figuring out which statistical test to use
3.calculating the test statistic and degrees of freedom - find critical value from a stats table
- rejected or accepting the null
differences between two means
t test
difference between several means
ANOVA
test of correlation
correlation
test of association
chi square
test of independence
chi square
discrete
cannot be represented by fractions or numbers
classes or categories for example: gender v major
continuous
can be represented by numbers and/or fractions
both independent and dependent variable are discrete
chi square
independent is discrete, dependent is continuous
t test
both are continuous
correlation
Test statistics usually are very large
when your null hypothesis is very wrong, and usually very small when your null hypothesis is correct.
p-value
the probability that your results or more extreme results than yours could have occurred due to chance even when the null hypothesis is actually true
tcalc > tcrit
reject the null
When you say that you reject the null hypothesis at p < 0.05, you are essentially saying
“There is a less than 5% chance that I could have gotten these results if the null hypothesis were true, so I would rather conclude that the null is not true than accept such an unlikely outcome.”
Paired t-test
the two samples are paired or dependent because they contain the same subjects. Conversely, an independent samples t test contains different subjects in the two samples
t test
used when the data of two samples are statistically independent
inductive reasoning
explanations/hypotheses/theories
deductive reasoning
predictions
high values tend to go
down
low values tend to
go up
flaws in statistical thinking
stats and probability are intuitive
we tend to jump to conclusions
we tend to be over confident
we see patterns in random data
we don’t realize that coincidences are common
we find it hard to combine probabilities: monty hall
we are fooled by regression to the mean
replication
is the repetition of an experimental condition so that the variability associated with the phenomenon can be estimated.
statistical sample
The group of replicated measurements that is used to help estimate natural variability
population
the selection of a subset (a statistical sample) of individuals
random sample
all individuals in a population should have an equal probability of being selected so that the proportions sampled can help us estimate the probability that similar samples would occur in the future
sampling bias
If something about the sampling process causes a particular type of individual in the population to be more likely to be sampled
Sampling error
It is the amount by which samples will differ due to chance
Systematic sampling
techniques attempt to overcome this problem by “using information about the population” to choose a more representative sample
Sample size determination
is the act of choosing the number of observations or replicates to include in a statistical sample
Power
the power of a statistical test is the probability that the test will detect: 1) a pattern in the population if the pattern truly exists or 2) the effect of a specific condition on the population if the effect truly exists
effect size
A strong pattern
Pseudoreplication
includes experimental designs in which treatments are not replicated (though samples may be) or replicates are not statistically independent resulting in an inflation of the reported number of samples or replicates.
bivariate
dealing with two variables, usually an independent variable and a dependent variable
Multivariate statistics
will deal with more than two variables (e.g. three or more dependent variables, or any combination of multiple independent and dependent variable
community ecology
the study of populations of two or more different species occupying the same geographical area and in a particular time
Classification
the placement of species and/or sampled locations into groups
used to distinguish different kinds of communities from each other if there appear to be clear distinctions
ordination
the arrangement or ‘ordering’ of species and/or sample locations along environmental gradients
can be more useful when there are not clearly distinct kinds of communities because they grade into one another with fuzzy boundaries
community data matrix
has taxa (usually species) as rows and samples as columns (Table 1) or vice versa
Principal Components Analysis
takes your cloud of data points, and rotates it such that the maximum variability is visible. Another way of saying this is that it identifies your most important differences.
Detrended Correspondence Analysis
DCA only represents the patterns of dependent variables (species abundance) but does not directly compare the species abundances to the possible independent variables that cause them
We could use the DCA to make hypotheses about the causes of the species distributions
triplot
It is called a triplot because it simultaneously displays three pieces of information: samples as points, species as points, and environmental variables as lines.
Nonparametric statistics
include several different statistical methods in which the data are not assumed to come from prescribed models that are ‘custom fit’ to the data by a small number of parameters
parametric statistics
use general model descriptions associated with 1 or more numerical parameters, which can be adjusted to allow the models to be applied to a variety of data sets ex:normal distribution model, the Poisson distribution model, and the binomial distribution model
permutations
These permutations keep the actual data intact, but randomly associate the environmental data with the species data
Direct Gradient Analysis (DGA)
Thus, DGA is best coupled with an ordination (multivariate) technique like CCA.
canonical correspondence analysis
If we directly include environmental variables as independent variables we are changing our DCA into a CCA
centroid
basically the center of a cloud of points
PCA 1
a ‘best fit’ line for the cloud of points
eigenvalue
they are ranked from the highest to the lowest
These are related to the amount of variation explained by each axis
95% confidence interval
is a range of values that has a 95% chance of containing the true single value that you are trying to estimate.
Poisson distribution
The random distribution of numbers of sightings
binomial distribution
two parameters
t you are distinguishing between two (and only two) possible outcome
intervals overlap
not different
The standard statistical technique to detect a nonrandom relationship between two continuous variables
correlation
species richness is a
discrete variable