Statistics Flashcards
Sample of convenience
A collection of individuals that happen to be available at the time
Sampling error
The chance difference between an estimate and the population parameter being estimated
Bias
A systematic discrepancy (tending in a certain direction) between an estimate and the true population
Error
A random difference (not tending in any direction) between an estimate and the true population characteristic
(Larger, normal, small) samples on average will have smaller sampling error
Larger
Increase the number of individuals in your sample
Decrease sampling error
Ensure random sampling
Reduce sampling bias
Variance
Average squared deviation from the mean
Coefficient of variation
Expresses how big the standard deviation is in relation to the mean
Variation in sample means decreases with ____
Increased sample size
Standard error
The standard deviation of a sampling distribution (predicts the sampling error of the estimate)
The standard error of an estimate of a mean is the standard deviation of the distribution of sample means.
95% Confidence Interval
Provides a plausible range of the parameter (95% of all 95% confidence intervals calculated from samples will include the population mean)
Pseudoreplication
Error that occurs when individual measurements are not independent but are treated as though they are
Test statistic
A number calculated to represent the match between a set of data and the null hypothesis
P-value
The probability of getting the data or something more unusual if the null hypothesis were true
Type I Error
- Rejecting a true null hypothesis
- Pr[Type I Error] = a
- Does not depend on sample size
Type II Error
- Not rejecting a false null hypothesis
- Pr[Type II Error] = B
- B lowers with larger sample sizes
- The smaller the B the more power a test has
Power
The ability of a test to reject a false null (Power = 1 - B)
Poisson distribution
Describes the probability that a certain number of events occur in a block of time or space, when those events happen independently of eqch other and occur with equal probability at every point in space/time
Central limit theorem
The sum or mean of a large number of measurements randomly sampled from any population is approximately normally distributed
Null distribution for a test statistic
The probability distribution of alternative outcomes when a random sample is taken from a hypothetical population in which the null hypothesis is true
Paired design
- Data from two groups are paired
- Each member of a pair shared much in common except for the tested categorical variable
- Accounts for extraneous variation
- Mean of the differences
Transformation require:
1) Same transformation applied to each individual/group
2) One-to-one correspondence with original value (no ambiguity)
3) Monotonic (order stays the same)
Goals of experiments
1) Eliminate bias
2) Reduce sampling error
Features that reduce bias
1) Controls
2) Randoom assignment of treatments (averages the effects of confounding variables)
3) Blinding/anonymizing
How to reduce sampling error
Increase signal to noise ratio
Lower “noise” by increasing sample size and reducing variation within groups (all other factors as equal as possible)
Design features to reduce sampling error
1) Replication: carry out study on multiple independent objects
2) Balance: nearly equal sample sizes in each treatment
3) Blocking: Grouping experimental units and applying different treatments within each group (accounts for extraneous variables)
4) Extreme treatments: stronger treatments
Matching
Pair individuals in treatment group with control individuals with similar values for confounding variables (reduces bias by limiting confounding and reduces sampling error analogous to blocking)
r^2
Describes the proportion of variation in one variable that can be predicted from the other variable (the proportion of variance in Y that can be predicted by the regression line)
Attenuation
The estimated correlation will be lower if X or Y are estimated with error