Statistics Flashcards by Dasha Soldatov

Pearson coefficient

Measures the strength and direction of relationship between two variables - ie linear correlation
0 - no relationship
0-1 or -1-0 = positive or negative linear relationship

How well did you know this?

Not at all

Perfectly

Kappa coefficient

Cohen’s kappa coefficient is a statistic that is used to measure inter-rater reliability for qualitative items. It is generally thought to be a more robust measure than simple percent agreement calculation, as κ takes into account the possibility of the agreement occurring by chance.

How well did you know this?

Not at all

Perfectly

Linear regression test

Look at cause and effect relationship
estimate the effect of one CONTINUOUS variable on another Try to determine a specific mathematical equation to describe the relationship (line of best fit)
Simple : one continous IV and one continous DV eg effect of income on longevity
Multiple: 2 or more continous IV and one continous DV eg effect of income and mins of exercise per day on longevity
Logistic regression: continuous IV and binary DV eg what is the effect of drug dosage on survival

How well did you know this?

Not at all

Perfectly

What do ANOVA and T tests have in common

Parametric
Compare differences between group means
Test the effect of a categorical variable on a quantitative DV
ANOVA- more than one IV, one DV
MANOVA- more than one IV and 2+ DV.What is the effect offlower speciesonpetal length,petal width, andstem length?
Repeated measures ANOVA compares the same group at various time points

How well did you know this?

Not at all

Perfectly

Corrolation tests

Check whether variables are relatedwithout hypothesizing a cause-and-effect relationship. I if you know one, can you predict the other

eg Pearsons r
2 continous variables eg how are latitude and temperature related
Spearmans r- 2 ranked/ordinal varibales

How well did you know this?

Not at all

Perfectly

Chi squared test

Chi square test of independence: Test if 2 categorical variables are related to each other
Is the species of flower related to petal size
Is there more sporting injuries in basketball compared to netball (compare proportions of people who are injured)
Chi square goodness of fit test: test weather observed frequncies are significantly different to what was expected (equal frequencies/proportion). Null hypothesis would be that there is no difference in proportions in each category
Fishers exact test: like chi squared but if value <5 in one more more cells in data set

How well did you know this?

Not at all

Perfectly

Kruskal Wallis test

non parametric version of ANOVA
3 + categories + one quanitative outcome variable

How well did you know this?

Not at all

Perfectly

Wilcoxon signed ranke test

non parametric version of paired t test

How well did you know this?

Not at all

Perfectly

mann witney u test

non parametric version of independant t test

How well did you know this?

Not at all

Perfectly

Bonferroni correction

Post hoc test. The Bonferroni correction is a multiple-comparison correction used when several dependent or independentstatistical testsare being performed simultaneously
If there are more than 2 groups in a varibale and the null hypthesis is rejected with the first statistical test, need to do a Bonferroni to figure out which 2 groups are significantly different from each other. A Bonferroni correction is when you divide your originalsignificance level(usually .05) by the number of tests you’re performin

How well did you know this?

Not at all

Perfectly

Absolute risk

the number of events in a group, divided by the number of people in that group

How well did you know this?

Not at all

Perfectly

ARR (absolute risk reduction, aka attributable risk, risk difference)

Absolute risk in contol group - absolute risk in treatment group

How well did you know this?

Not at all

Perfectly

relative risk

absolute risk in treatment/ absolute risk in control

How well did you know this?

Not at all

Perfectly

relative risk reduction

Risk difference/ absolute risk in control
(ARC – ART) / ARC

How well did you know this?

Not at all

Perfectly

relative risk reduction

1- relative risk

How well did you know this?

Not at all

Perfectly

odd ratio

WITH/WITHOUT

probability of outcome occurring/probability of outcome not occurring

=cross product = AD/BC
odd that case exposed/odds control exposed
= (A/C) / (B/D)

How well did you know this?

Not at all

Perfectly

Prevelance

PREVELENCE= all cases/total population
Prevalence depends on: incidence, recovery rate, and death rate (ie influenced by both the rate at which new cases are occurring and the average duration of the disease)
Prevalence = (Incidence Rate) x (Average Duration of Disease)
Point prevalence- at a specific moment in time
Period prevalence- over a specific period of time

incidence

INCIDENCE = new cases per time period/population at risk
Population at risk = total population who can get the disease- those who already have the disease
Incidence reflects the rate at which new cases of disease are being added to the population (and becoming prevalent cases).
Incidence rate: new cases in a certain period of time

standard deviation

measures variation/dispersion of dataset relative to the mean 68-95-99.7

confidence interval

The 95% confidence interval is a range of values that you can be 95% confident contains the true mean of the population.
To calculate the confidence interval, start by computing the mean andstandard errorof the sample.
The narrower the interval (upper and lower values), the more precise is our estimate.
As a general rule, as a sample size increases the confident interval should become more narrow.

cross sectional study

Case control study looks at those who have the disease, and then look backwards to see if they have the past exposure in question, so better for rare disease
- Efficient in design for study of RARE diseases
- Requires fewer subjects than other studies
- Best design for diseases with long latent periods
- Can evaluate multiple possible/potential exposures

Type 1 error

False positive (incorrectly rejects null hypothesis )
Pr type 1 error = Alpha
alpha level (α), which is thep-value below which you reject the null hypothesis. Ap-value of 0.05 indicates that you are willing to accept a 5% chance that you are wrong when you reject the null hypothesis.
Can reduce risk T1error by using lower P value, eg P 0.01 means 1% chance of a type 1 error

Type 2 error

False negative (fails to reject null hypothesis)
ie saying no effect when there is
· The probability of making a type II error = Beta (β), and this is related to the power of the statistical test (power = 1- β). You can decrease your risk of committing a type II error by increasing the power of the test.
· Power is increased by increasing sample size

internal validity

Internal validity: the extent to which you can be confident that a cause-and-effect relationship established in a study cannot be explained by other factors.-

eg Designs of study, minimal systemic bias , Allocation concealment, randomization, blinding, appropriate comparer, intention to treat,

external validity

External validity: is the validity of applying the conclusions of a scientific study outside the context of that study. In other words, it is the extent to which the results of a study can be generalized to and across other situations, people, stimuli, and times

Left skew/negative skew

tail on the left- mean- median - mode (most= peak)

right skew /postive skew

Mode (peak) - median- mean - tail on the right

sensitivity

ability to detect disease sensitve test, when negative, rules disease out true postive/all those with disease

specificity

ability to detect those without disease a specific test, when positive, will rule a disease in true negative/ all those without disease

Positive predictive value

likelihood of having disease when test is positve

Negative predictive value

likelihood of not having disease when test is negative

Positive Likelihood Ratio

if test positive, how likely is patient to have disease sensitivity/1- specificity

Negative Likelihood Ratio

if test is negative, how likely is patient to have disease 1- sensitivity/specificity

Number needed to treat

1/ absolute risk reduction

Clinical trials

Preclinical · In vitro/ animal Phase 0/ Pilot · Preliminary pharmacokinetics/pharmacodynamic data · Micro dosing /subtherapeutic dosing · Very small Phase I · Safety · Dosage, side effects · Further PK/PD information · Small groups (<100) · Healthy volunteers Phase II · Safety and Efficacy · Dose requirements/dose response · Larger groups, several hundred (100-300) · Case series/ small RCT Phase III · Efficacy compared to current standard treatment · Several hundred to thousands (300-5000) · Individuals with disease · >1 RCT usually needed Phase IV · Surveillance, continued pharmaco-vigilance/post marketing surveillance · Cost efficacy · Longer term / rare effects · After marketing · Effectiveness in general population

Standard error

Standard error = measures the amount of variability in the sample mean; it indicates how closely the population mean is likely to be estimated by the sample mean

Bias best avoided by

Randomisation Blinding intention to treat analysis

confounding best avoided by

randomisation matching on variables eg sex, age

magnitude of effect in various studies

· Case control = odds ratio · Cohort = relative risk · RCT o Absolute risk difference o Relative risk difference NNT

Pre test probability

Prevelance Those with disease / population

when does the OR approximate the RR

low prevalence condition