Statistics Flashcards
A symmetric distribution with equal mean, median, and mode
Standard deviations follow the 68 - 95 - 99.7 rule
Normal distribution
A normal distribution with a mean of 0 and a standard deviation of 1
Standard distribution
Asymmetric distribution. Named either left-skewed or right-skewed by what side the tail is on
For right skew distributions, mean > median.
For left skew distributions, median > mean.
Skewed distribution
IQR = (third quartile) - (first quartile)
Interquartile range
Line inside the box represents the median
Upper and lower quartiles (75%, 25% of data rely beyond, respectively) indicated by upper and lower edges of the box
Maximum and minimum datapoints represented by the whiskers
95% confidence interval indicated by the notches. If notches between boxes do not overlap → statistical significance indicated
Box-and-whisker plot
The factor that is being changed or manipulated
Independent variable
The outcome that is being measured
Dependent variable
Variable that influences both the independent and dependent variables. Confounds the ability to determine causality
Example: Tobacco smoking confounds the relationship between chewing tobacco and mortality
Confounding variable
Variable that influences the strength of a relationship between two other variables
Example: Amount smoked influences the relationship between cigarette smoking and mortality
Moderating variable
Variable that explains the relationship between two other variables
Example: Increased cancer risk helps explain relationship between smoking and mortality
Mediating variable
The r value, describing the linear relationship between two variables
Ranges from -1 to 1 and describes the direction and strength of an association
Example: -0.30 is a negative and moderate association, whereas +0.90 is a positive and strong association
Correlation coefficient
Study variables that are not directly measurable (e.g. depression, happiness) are defined in a way that they can be measured
Variable operationalization
p value definition
Describes the likelihood of finding a difference when, in reality, there is no difference between the groups (null hypothesis is true)
Statistical significance threshold for the study is most often set at α = 0.05 (5%)
This means if p < 0.05, then we reject the null hypothesis; if p > 0.05 then we fail to reject the null hypothesis
Requires categorical variables
Compares null hypothesis vs alternative hypothesis, looking to see if the two distributions of categorical data differ
Chi-square
Requires continuous variables
Compares the mean values of continuous variables of 2 groups
T-test
Requires continuous variables
Similar to t-test, but can be used for 3 or more groups
ANOVA
A threshold that describes the chance that results of a study are due to random chance rather than causal effect, usually set at α = 0.05
Statistical significance
When the null hypothesis is incorrectly rejected, in other words a false positive
type 1 error
When the null hypothesis is incorrectly supported, in other words a false negative
type 2 error
States that no significant difference or relationship exists between study variables
Null hypothesis
factors that can cause a result of a study to differ from the true result
bias
Bias introduced by the selection process of including subjects in a study (e.g. study population is not representative of the whole population)
selection bias
Study subjects behave differently when they know they’re being studied
Hawthorne effect
Bias in survey studies where people answer in a way considered socially desirable and acceptable
Social desirability bias
Administration of an inactive substance or sham procedure corresponds to improved symptoms
Placebo effect
Often related to a person’s belief that a treatment will work
Comparison between placebo group and treatment group is used to determine the true benefit of a given treatment (where the patients are unaware if they received the treatment or placebo - known as blinding)
Certain individuals in a population have a greater chance of being selected for a study than other individuals, resulting in a sample that does not accurately reflect the population
Sampling bias
Tendency of a person to answer questions on a survey untruthfully or misleadingly
Response bias