Statistics Flashcards
Pearson coefficient
Measures the strength and direction of relationship between two variables - ie linear correlation
0 - no relationship
0-1 or -1-0 = positive or negative linear relationship
Kappa coefficient
Cohen’s kappa coefficient is a statistic that is used to measure inter-rater reliability for qualitative items. It is generally thought to be a more robust measure than simple percent agreement calculation, as κ takes into account the possibility of the agreement occurring by chance.
Linear regression test
Look at cause and effect relationship
estimate the effect of one CONTINUOUS variable on another Try to determine a specific mathematical equation to describe the relationship (line of best fit)
Simple : one continous IV and one continous DV eg effect of income on longevity
Multiple: 2 or more continous IV and one continous DV eg effect of income and mins of exercise per day on longevity
Logistic regression: continuous IV and binary DV eg what is the effect of drug dosage on survival
What do ANOVA and T tests have in common
Parametric
Compare differences between group means
Test the effect of a categorical variable on a quantitative DV
ANOVA- more than one IV, one DV
MANOVA- more than one IV and 2+ DV.What is the effect offlower speciesonpetal length,petal width, andstem length?
Repeated measures ANOVA compares the same group at various time points
Corrolation tests
Check whether variables are relatedwithout hypothesizing a cause-and-effect relationship. I if you know one, can you predict the other
eg Pearsons r
2 continous variables eg how are latitude and temperature related
Spearmans r- 2 ranked/ordinal varibales
Chi squared test
Chi square test of independence: Test if 2 categorical variables are related to each other
Is the species of flower related to petal size
Is there more sporting injuries in basketball compared to netball (compare proportions of people who are injured)
Chi square goodness of fit test: test weather observed frequncies are significantly different to what was expected (equal frequencies/proportion). Null hypothesis would be that there is no difference in proportions in each category
Fishers exact test: like chi squared but if value <5 in one more more cells in data set
Kruskal Wallis test
non parametric version of ANOVA
3 + categories + one quanitative outcome variable
Wilcoxon signed ranke test
non parametric version of paired t test
mann witney u test
non parametric version of independant t test
Bonferroni correction
Post hoc test. The Bonferroni correction is a multiple-comparison correction used when several dependent or independentstatistical testsare being performed simultaneously
If there are more than 2 groups in a varibale and the null hypthesis is rejected with the first statistical test, need to do a Bonferroni to figure out which 2 groups are significantly different from each other. A Bonferroni correction is when you divide your originalsignificance level(usually .05) by the number of tests you’re performin
Absolute risk
the number of events in a group, divided by the number of people in that group
ARR (absolute risk reduction, aka attributable risk, risk difference)
Absolute risk in contol group - absolute risk in treatment group
relative risk
absolute risk in treatment/ absolute risk in control
relative risk reduction
Risk difference/ absolute risk in control
(ARC – ART) / ARC
relative risk reduction
1- relative risk
odd ratio
WITH/WITHOUT
probability of outcome occurring/probability of outcome not occurring
=cross product = AD/BC
odd that case exposed/odds control exposed
= (A/C) / (B/D)
Prevelance
PREVELENCE= all cases/total population
Prevalence depends on: incidence, recovery rate, and death rate (ie influenced by both the rate at which new cases are occurring and the average duration of the disease)
Prevalence = (Incidence Rate) x (Average Duration of Disease)
Point prevalence- at a specific moment in time
Period prevalence- over a specific period of time
incidence
INCIDENCE = new cases per time period/population at risk
Population at risk = total population who can get the disease- those who already have the disease
Incidence reflects the rate at which new cases of disease are being added to the population (and becoming prevalent cases).
Incidence rate: new cases in a certain period of time
standard deviation
measures variation/dispersion of dataset relative to the mean 68-95-99.7
confidence interval
The 95% confidence interval is a range of values that you can be 95% confident contains the true mean of the population.
To calculate the confidence interval, start by computing the mean andstandard errorof the sample.
The narrower the interval (upper and lower values), the more precise is our estimate.
As a general rule, as a sample size increases the confident interval should become more narrow.
cross sectional study
Case control study looks at those who have the disease, and then look backwards to see if they have the past exposure in question, so better for rare disease
- Efficient in design for study of RARE diseases
- Requires fewer subjects than other studies
- Best design for diseases with long latent periods
- Can evaluate multiple possible/potential exposures
Type 1 error
False positive (incorrectly rejects null hypothesis )
Pr type 1 error = Alpha
alpha level (α), which is thep-value below which you reject the null hypothesis. Ap-value of 0.05 indicates that you are willing to accept a 5% chance that you are wrong when you reject the null hypothesis.
Can reduce risk T1error by using lower P value, eg P 0.01 means 1% chance of a type 1 error
Type 2 error
False negative (fails to reject null hypothesis)
ie saying no effect when there is
· The probability of making a type II error = Beta (β), and this is related to the power of the statistical test (power = 1- β). You can decrease your risk of committing a type II error by increasing the power of the test.
· Power is increased by increasing sample size
internal validity
Internal validity: the extent to which you can be confident that a cause-and-effect relationship established in a study cannot be explained by other factors.-
eg Designs of study, minimal systemic bias , Allocation concealment, randomization, blinding, appropriate comparer, intention to treat,
external validity
External validity: is the validity of applying the conclusions of a scientific study outside the context of that study. In other words, it is the extent to which the results of a study can be generalized to and across other situations, people, stimuli, and times
Left skew/negative skew
tail on the left- mean- median - mode (most= peak)
right skew /postive skew
Mode (peak) - median- mean - tail on the right
sensitivity
ability to detect disease
sensitve test, when negative, rules disease out
true postive/all those with disease
specificity
ability to detect those without disease
a specific test, when positive, will rule a disease in
true negative/ all those without disease
Positive predictive value
likelihood of having disease when test is positve
Negative predictive value
likelihood of not having disease when test is negative
Positive Likelihood Ratio
if test positive, how likely is patient to have disease
sensitivity/1- specificity
Negative Likelihood Ratio
if test is negative, how likely is patient to have disease
1- sensitivity/specificity
Number needed to treat
1/ absolute risk reduction
Clinical trials
Preclinical
· In vitro/ animal
Phase 0/ Pilot
· Preliminary pharmacokinetics/pharmacodynamic data
· Micro dosing /subtherapeutic dosing
· Very small
Phase I
· Safety
· Dosage, side effects
· Further PK/PD information
· Small groups (<100)
· Healthy volunteers
Phase II
· Safety and Efficacy
· Dose requirements/dose response
· Larger groups, several hundred (100-300)
· Case series/ small RCT
Phase III
· Efficacy compared to current standard treatment
· Several hundred to thousands (300-5000)
· Individuals with disease
· >1 RCT usually needed
Phase IV
· Surveillance, continued pharmaco-vigilance/post marketing surveillance
· Cost efficacy
· Longer term / rare effects
· After marketing
· Effectiveness in general population
Standard error
Standard error = measures the amount of variability in the sample mean; it indicates how closely the population mean is likely to be estimated by the sample mean
Bias best avoided by
Randomisation
Blinding
intention to treat analysis
confounding best avoided by
randomisation
matching on variables eg sex, age
magnitude of effect in various studies
· Case control = odds ratio
· Cohort = relative risk
· RCT
o Absolute risk difference
o Relative risk difference
NNT
Pre test probability
Prevelance
Those with disease / population
when does the OR approximate the RR
low prevalence condition