Module 1 & 2 Flashcards
Introduction to Statistics and Experimental Design
What is a sample?
A subset of individuals from a population of interest
What is a population?
Set of all subjects relevant to the scientific hypothesis under examination
What is a statistic?
A value calculated from a sample, used to estimate population parameters
What is a parameter?
A true measurement that describes a population
What is a statistical hypothesis?
A claim regarding a population parameter
What is sampling error?
The deviation from estimates and a true population parameter, purely based on chance
What are the characteristics of a good sample?
1) It is a random sample
2) It is precise
3) It is unbiased
What does precision refer to?
The spread of values for an estimate due to sampling error
What is the relationship between sample size and precision?
Higher sample size = higher precision and lower sampling error
What does bias refer to?
Systematic discrepancy between estimates from multiple samples and the true population parameter
What are 2 types of non-random samples?
1) Sample of convenience
2) Volunteer sample
What are 2 types of studies?
1) Experimental, where treatments are assigned
2) Observational, where treatments are not assigned by the researcher
What can be a problem for observation studies?
Confounding variables - variables that influence the outcome, which is not accounted for
What are the 2 types of variables
1) Qualitative/Categorical (membership)
2) Quantitative (magnitude)
What are the measurement scales for qualitative data?
Nominal and ordinal
What are the 2 types of quantitative data?
Continuous and discrete
What are the measurement scales for quantitative data?
Ratio and interval
What is descriptive statistics?
Quantities that describe the population
What are the components of descriptive statistics?
1) Shape
2) Spread
3) Location
4) Frequency distribution
What is frequency distribution?
Describes the number of times a particular value of a variable occurs in a sample (can be absolute or relative)
What are 2 ways we can depict frequency distributions?
Bar graphs or histograms
When is it best to use a histogram?
When we are looking at the frequency distribution of a numerical data set
When is it best to use a bar graph?
When we are looking at categorical data sets
What are the 3 types of distribution?
1) Frequency distribution
2) Probability distribution
3) Sampling distribution
What is a probability distribution?
A distribution that depicts the probabilities associated with different values of a specific variable
What is a sampling distribution?
A probability distribution of values of an estimate we could obtain from sampling a population
What are the 3 measurements of location?
Mean, median, mode
How can we describe the spread of data?
1) Skew
2) Range (max-min)
3) Variance
4) Standard deviation
What is standard deviation (SD)?
It is a measure of spread - describes the average difference of values observed from the mean
What is estimation?
Process of inferring a population parameter using sample data
What is uncertainty and how do we quantify uncertainty?
Error of an estimate
1) Standard error (of the mean)
2) Confidence interval
What is standard error (SE)?
Describes the standard deviation of the sample distribution of an estimate (e.g., the mean)
How do we obtain a smaller standard error?
Bigger sample size > narrower sampling distribution > more precise estimate
What is confidence interval (CI)?
Range of values calculated from the data likely to contain the population parameter within its range
What are the steps of hypothesis testing?
1) State the hypotheses
2) Calculate test statistic
3) Determine p-value
4) Appropriate conclusion
What is a test statistic?
Calculated from the data and used to determine how compatible the observed data is to expected results under the null hypothesis
What is the p-value?
Probability of a obtaining a result as extreme or more extreme than the observed, assuming the null hypothesis were true (obtained from the null distribution)
What is the null distribution?
Sampling distribution for a test statistic under the assumption that the null hypothesis is true
What should a conclusion include?
Test used, test statistic value, df, p-value (and sample size)
What is type I error?
The probability associated with rejecting a true null hypothesis (false positive)
What determines the probability of committing a type I error?
The significance level, which sets the criterion for rejecting the null hypothesis
What is a type II error?
Probability association with failing to reject a false null hypothesis (false negative)
What determines the probability of committing a type II error?
Power - where the risk of a type II is inversely related to the statistical power of a study
What is power?
Extent to which a test can correctly detect a real effect when there is one
What determines power?
1) Size of the effect (bigger = more easily detectable)
2) Significance level (increase = more powerful)
3) Measurement error
4) Sample size (bigger = more powerful)
What can power analysis be used for?
To determine how big a sample should be to attain a desired power level
What is a disadvantage of experimental studies?
Experimental artifacts which introduce bias through unintended consequences of experimental procedures
What are 3 ways to reduce bias?
1) Control groups
2) Randomization
3) Blinding
What are 3 ways to reduce the effect of sampling errors?
1) Replication
2) Balance
3) Blocking
What is a control group?
Group of experimental units that do not receive the treatment of interest but are kept under the same exact conditions as the treated experimental units
What is randomization?
Random assignment of treatments to units in an experimental study, breaking associations between possible confounding variables
What is blinding?
Process of concealing information about the control/treatment group assignment (single or double blind)
How would having all identical experimental units affect sample error/bias?
Sample error is reduced
What is replication?
Applying the same treatment to multiple, independent experimental subjects
What is pseudoreplication?
Assumptions of independence when assigning the same treatment to multiple individuals is violated
What is balance?
All treatments have equal sample size
What is blocking?
Grouping of experimental units that have similar properties (repeating the same experiment to account for spatial/temporal differences)
What are the benefits/disadvantages of using extreme treatments?
1) Treatment effects are easier to detect when they are large (increased power)
2) The effects of a treatment do not always scale linearly with the magnitude
When do we use a scatterplot?
When both variables are numerical
When do we use a boxplot?
When we want to depict a continuous variable in terms of its distribution OR when Y is a continuous variable and X is categorical
What do we use a QQ plot for?
To test for normalcy
What are the patterns we can see on a QQ plot?
1) Linear = normal
2) Exponential curve upward = right skew
3) Exponential downward curve = left skew