Statistics Flashcards
meta-analysis def
single analysis of all existing analyses
never interpreted as primary research
steps of meta-analysis
specify question search for evidence judge evidence quality display findings combine results when possible conclude with summary of evidence
evidence searching
PubMed, NIH, etc.
search unpublished findings
search “gray” data/publishings
funnel plot axes
Odds Ratio scattered against standard error
shows publication bias
reasons for association between factor and dz
bias in the sampling of subjects bias in the measurement of the factor confounding by another factor chance transposition of cause and effect actually causal
qualitative variables
“nominal” (eg, gender, hair color, city)
amenable to categorical anaysis only
quantitative variables
continuous variability with normal distribution
categorical (ordinal)
- dichotomous if only two categories
- ordinal
dichotomous data
only two categories possible
can exit without hierarchy
- male/female
can exist with hierarchy
ordinal data
nominal data with more than two possible states and an existing hierarchy, i.e. grades A, B, C, D, F
education level
cancer stage I,II,III,IV
1-year age categories
continuous data
any value between two other values possible
weight, blood pressure, IQ, etc.
Likert scales
Least favorite Most favorite
1 2 3 4 5 6
tracks like continuous data
measures of central tendency
mode - most commonly observer value
median - middle observation in a data set arranged from lowest to highest
mean - the arithmetic average (sum of observations/the number of observations)
measures of spread
range - highest value minus lowest value
variance - a standardized measure of the sum of the differences between each value and the mean value
standard deviation - the square root of the variance, which has special properties when describing a Normal Distribution curve
variance definition
sum (x-mean)^2 / (n-1)
standard deviation definition
square root of variance
frequency distribution
conceptually identical to a pile of anything
the center of the pile is the central tendency
the total width of the pile is the range
the amount of scatter is the variance
normal distribution is common in nature
mean = median
the % of subjects with various values can be estimated by the standard deviation
normal distribution characteristics
mean = median
approximately 67% of values lie within +/- one standard deviation distance from the mean
approximately 95% of values lie within +/- two standard deviation diwstances from the mean
approximately 99% of values lie within +/- three standard deviations of the mean
meaning of two standard deviations
95% values inside, only 5% outside
methods to control for confounding
matching
stratification in analysis
adjustment in analysis
- Direct adjustment (eg, age-adjusted rates)
- Multivariate analysis
Age adjustment
Direct method
- Uses the population distribution of a reference population
- recompute crude rates against standard to get adjusted rates
Indirect method
- compare observed number of cases to an expected number of cases in a population after generating expected number using the age-specific disease rates of a reference population
multivariate adjustment
multiple linear regression (beta for slope)
When outcome variable is continuous
multiple logistic regression (Odds Ratio for RR)
When outcome variable is dichotomous
Proportional hazards analysis (Hazard Ratio for RR)
When outcome variable is dichotomous and person-time is an independent variable
Inferential Statistics
used to describe findings in a study
also used to make yes/no decision on likelihood of chance occurrence
Null hypothesis
assume no effect
take measurements
disprove (reject) the null hypothesis
Alternative hypothesis
there is an effect
Type I error
finding a difference when there isn’t one
Type II error
finding no difference when there is one
real difference and tested difference
correctly reject null hypothesis
no real difference and no tested difference
correctly accept null hypothesis
Alpha level
tolerance for making a type I error
set at 0.05, means willing to accept a 5% chance of being wrong (reject H0 when you shouldn’t)
Investigator sets the alpha level a-priori
p-value
estimated probability that the results occurred by chance alone
= risk of making a type I error
if the p-value is equal to or less than the alpha level, the research rejects the null hypothesis and concludes the results are “statistically significant” (yes/no decision)
Beta level
tolerance for making a type II error
set at 0.20 means willing to accept a 20% chance of being wrong (accept H0 when you shouldn’t)
Investigator sets the beta level a-priori
significant in small sample size studies
beta power
Power = 1 - beta
Confidence intervals
The confidence interval gives the range of values within which the true, population effect size can be expected to fall
A 95% confidence interval corresponds to an alpha level of 0.05; there is less than a 5% chance that the true, population effect size is larger or smaller than the bounds of this interval
Note that a small confidence interval provides more confidence that the true population effect size is close to what you found in your study
confidence intervals and statistical significance
if the confidence interval for a mean or percent difference includes 0, the difference is not statistically significant
If the confidence interval for a relative risk includes one, the difference in risk is not statistically significant
statistical vs clinical significance
clinical significance relates to the degree of importance of a finding in providing care for patients
A study with a large sample size might find a statistically significant difference that doesn’t matter clinically
A study with a small sample size might have a result that is not statistically significant, but if it were, it would have huge clinical significance
t-test
use difference in means, standard deviation, and sample size to establish variance?
chi-squared statistic
comes from categorical (not continuous) data
- alive/not alive
- respond/didn’t respond
- diseased/not diseased
OR
Odds Ration
What affects the chi-squared statistics
???