Statistics and critical appraisal Flashcards
Internal and external validity
Internal - was the study done right? Do the results accurately reflect the truth?
External - does the same thing happen elsewhere? Is this study applicable to real life?
Efficacy and effectiveness
Efficacy - impact of an intervention under ideal conditions
Effectiveness - impact of an invervention under clinical/real life conditions
Berkson bias
Sample population is taken from hospital setting, but these are not representative of target population (rate or severity)
Diagnostic purity bias
Comorbidity excluded, so complexity of target population not represented
Neyman bias
Time gap between exposure and sample selection, meaning some are not available for study (eg due to death)
Membership bias
Particular group is targeted for study, which is not representative (eg in a particular organisation)
Historical control bias
Subjects and controls chosen over time, so affected by changes in social definitions, treatment modalities etc.
Performance bias
Subjects behave differently because they know which group they are in. Controlled for by blinding and randomisation.
Ascertainment/interviewer bias
Researcher not blinded, which affects recording of results
Recall bias
Subjects mis-remember past
Response bias
Subjects answer questions in the way they think the researcher wants them to answer
Attrition bias
Bias due to subjects leaving the study at different rates in different groups in the study (eg due to side-effects)
Hawthorne effect
Subjects alter their behaviour, as they know they are being observed
Pygmalion (Rosenthal) effect
Subjects perform to meet expectations set by others (usually positively). Known as placebo effect in medical settings.
Inter-rater reliability
Agreement between different assessors at the same time (do different people agree with each other?)
Intra-rater reliability
Agreement between the results from one assessor at different times, whilst assessing the same material (does one person agree with herself?)
Test-retest reliability
Agreement between results of a test, and the results of the same test repeated at a later date.
Alternative form reliability
Agreement between the results of different versions of the same test
Split half reliability
Reliability of a test that is divided in two, with each half assessing the same material (do all parts of the test contribute equally?)
Cohen’s statistic (k)
Measures agreement between raters in tests measuring categorical variables. If no more than expected by chance, k=0. Statistically significant if k≥0.7
Crohnbach’s alpha
Measures agreement between variables when using complicated tests with several parts or measuring several variables.
Intraclass correlation co-efficient
Measures agreement. For use with quantitative variables
Predictive validity
Ability of a test to predict something it should theoretically be able to predict (eg predicting employment whilst at school)
Concurrent validity
Ability of a test to distinguish between 2 groups that it should theoretically be able to distinguish between (eg angina vs gastritis)
Convergent validity
The extent to which a test agrees with other tests it should theoretically be similar to (eg different types of thermometer)
Discriminant validity
The extent to which a test differs from a test it should theoretically be different from (eg exam results and SJT test)
Face validity
The extent to which the test, on superficial consideration, measures what it is supposed to measure.
Content validity
The extent to which the test measures variables that are related to that which should be measured by the test.
Construct validity
The extent to which the test measures a theoretical concept by a specific measuring device or procedure.
Incremental validity
The extent to which the test provides a significant improvement in addition to the use of another approach. (eg ultrasound in estimating foetal age cf dates alone)
Incidence
Number of new cases in a period of time / population size
Usually expressed per year
Categorical data
Data that has no numerical value and cannot be measured on a scale (eg. marital status, dead/alive).
Nominal data
Categorical data that does not have an order (eg male/female)
Ordinal data
Categorical data in which there is an order (eg social class)
Quantative data
Data with a numerical value
Discrete data
Data that exists as discrete numbers, and is not on a scale (eg. number of children)
Continuous data
Data that can have any value within the range of possible values (eg height, age)
Descriptive statistics for categorical data
Mode and frequency
Descriptive statistics for non-normally distributed quantitative data
Median and range or IQR
Descriptive statistics for normally distributed data
Mean and standard deviation
Proportion of data that falls within 1SD of the mean
68%
Proportion of data that falls within 2SD of mean
95%
Varience
Sum of all the differences between all values and the mean, squared, divided by (n-1).
Standard deviation
Square root of varience
Standard error
Estimate of the standard deviation that would be obtained from the means of a large number of samples drawn from the same population.
SE = SD/sqrt(n)
Confidence interval
CI = mean +/- 1.96 x standard error
Positive skew
long tail to the right
Negative skew
long tail to the left
Type 1 error
Null hypothesis is rejected when it is in fact true (false positive). Represented as alpha.
Type 2 error
Null hypothesis is accepted when it is in fact false (false negative). Represented as beta.
Convention for power
0.8
Information needed for power calculation
Power (1-beta)
Level of significance (alpha)
Underlying population event rate
Size of treatment effect
Statistical test for categorical, unpaired data
Chi-squared test
Statistical test for categorical, paired data
McNemar’s test
Statistical test for categorical, unpaired data with small sample set
Fisher’s exact test
Statistical tests for non-normal data comparing one sample with a hypothetical sample
Sign test or Wilcoxon’s signed rank test
Statistical test for non-normal data comparing two groups of unpaired data
Mann-Whitney U test
Statistical test for non-normal data comparing two groups of paired data
Wilcoxon’s matched pairs test
Statistical test for non-normal data comparing more than 2 groups of unpaired data
Kruskal-Wallis ANOVA
Statistical test for non-normal data comparing more than 2 groups of paired data
Friedman’s test
Statistical test for normal data comparing one sample with a hypothetical sample
One-sample t test
Statistical test for normal data comparing 2 groups of data
Student’s t-test (paired or unpaired)
Statistical test for normal data comparing more than 2 groups of data
ANOVA
Experimental event rate
events/total exposed
Control event rate
events/total control
Absolute risk reduction
CER-EER
Lies between -1 and +1
Relative risk
EER/CER
Risk in exposed/risk not-exposed
Relative risk reduction
(CER-EER)/CER
Proportional change in risk in exposure group
Number needed to treat
1/(CER-EER)
Risk
Outcome/total possibilities
Odds
Outcome/Not outcome
Odds ratio
Odds expsed/Odds not exposed
Face vailidity
Does the test seem to test what it claims to?
Content validity
Does a test test all of what it claims to? (Eg all symptoms of depression)
Criterion validity
Does a test test what it claims to better than the alternative test?
Construct validity
How well does a test examine the construct it claims to? Does it agree with other tests which test this construct, and differ from tests which measure different constructs?
CONSORT
Consolidated Standards of Reporting Trials - RCTs
TREND
Transparent Reporting of Evaluations with Non-randomized Designs
QUORUM
Quality of Reporting of Meta-analyses
PRISMA
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
MOOSE
Meta-analysis Of Observational Studies in Epidemiology
STROBE
Strengthening the Reporting of Observational Studies in Epidemiology
SQUIRE
Standards for QUality Improvement Reporting Excellence
STARD
STAndards for the Reporting of Diagnostic accuracy
MIAME
Minimum information about a microarray experiment
COREQ
Consolidated criteria for reporting qualitative research
Senitivity
Proportion of subjects with the disorder picked up by positive test.
Positive results/total true positives (a/a+c)
Specificity
Proportion of subjects without the disorder that have a negative test
Negative results/total true negatives (d/b+d)
Positive predictive value
Proportion of people who have a positive result on the test who DO have the disorder
True positive/total positive results (a/a+b)
Negative predictive value
Proportion of people with a negative result on the test who DO NOT have the disorder
True negatives/total negative results (d/c+d)
Likelihood ratio for a positive test
sensitivity/(1-specificity)
Likelihood ratio for a negative test
(1-sensitivity)/specificity
Pre-test probability
Probability that a person will have the disorder BEFORE test is done = prevalence
Pre-test odds
Odds that a person will have the disorder BEFORE test is done.
pre-test probability/(1-pre test probability)
Post-test odds
Odds that a person will have the disorder AFTER test is positive
pre-test probability x likelihood ratio for pos result
Post-test probability of a positive test
Probability that the person will have the disorder AFTER positive result
post-test odds/(1+post-test odds)
Same as positive predictive value
Post-test probability of a negative result
Probability that the person will have the disorder AFTER negative result
1-NPV
Cost-effectiveness analysis
Compares a number of interventions by relating costs to a single clinical measure of effectiveness (e.g. symptom reduction, improvement in activities of daily living).
Cost-benefit analysis
All the costs and benefits of an intervention are measured in terms of money and establish which has the greatest net benefit.
It requires that all the consequences of an intervention, such as life-years saved, treatment side-effects, symptom relief, disability, pain and discomfort, are allocated a monetary value.
Cost-utility analysis
A form of CEA in which health benefits / outcomes are measured in broader, more generic ways enabling comparisons between treatments for different diseases and conditions.
Multidimensional health outcomes are measured by a single preference- or utility-based index such as the QALY. Can compare treatments for different conditions.
Cost-minimisation analysis
An economic evaluation in which consequences of competing interventions are the same and in which only inputs, that is, costs are taken into consideration. The aim is to decide the least costly way of achieving the same outcome.