Biostats 2 Flashcards
Internal validity
extent to which a piece of evidence supports a claim about cause and effect, within the context of a particular study
That is, cause precedes the effect and they happen together, with little possibility for confounders – within the population
External validity
applies well to a population outside of the study
Least important of mean, median, mode?
Mode
quantifies the amount of variability, or spread, around the mean of the measurements.
Variance (σ2 )
a measure of variation of scores about the mean
more commonly used
standard deviation
The statement that establishes a relationship between variables being assessed
Alternative hypothesis (Ha or H1)
The statement of no difference or no relationship between the variables
Null hypothesis (Ho)
A ___ error is made if we reject the null hypothesis when null hypothesis is true.
type I
A ___ error is made if we fail to reject null hypothesis
Type 2
More important than p value – a better determination of significance
Any statistic is simply an estimate of the true value of that statistic
Confidence interval (CI)
95% CI states that we can be 95% certain that the “true” value is within the CI range
Narrower CI is better
Odds ratio of 1
no assocation
A screening test is used to separate from a large group of apparently well persons those who have a high probability of having the disease, so that they may be given a diagnostic work up, and if diseased can be treated.
What are the conditions for a screening?
The target disease is an important cause of mortality and morbidity.
A proven and acceptable test exists to detect individuals at an early stage of disease.
There is a treatment available to prevent mortality and morbidity once positives have been identified.
Sensitivity
ability of test to correctly identify those who have the disease
Specificity
ability of the test correctly identify those who DO NOT have the disease
Tends to rule OUT the disease
High Sensitivity means low probability of false positive
Sensitivity
Screening test’s ability to identify presence of disease
A test with high sensitivity will not miss many patients
who have the disease
A highly useful test when NEGATIVE
Sensitivity
Tends to rule IN the disease
High Specificity means low probability of false negative
Specificity
Screening test’s ability to truly identify absence of disease
That is, how likely is a negative test actually reporting the right answer?
A highly useful test when it is POSITIVE
Specificity
A highly ___ test is most useful to the clinician when it is NEGATIVE
sensitive
A highly ___ test is most useful to the clinician when it is POSITIVE
specific
Allows up to calculate the net sensitivity and net specificity of using both tests in sequence. After completing both tests there is a loss in net sensitivity and net gain in specificity.
Sequential (Two-Stage) Testing
proportion of patients who HAVE the disease and a positive test
POPULATION related
(e.g., HIV prevalence in suburban city in US vs. HIV prevalence in sub-Saharan Africa)
Positive Predictive Value (PPV)
With low prevalence (% of population) of disease:
Lower PPV
False positives increase
Less reliable positive test result
proportion of patients who DO NOT HAVE the disease, and have a negative test
Negative Predictive Value (NPV)
the occurrence, rate, or frequency of a disease
Incidence
Obtained from cohort studies
Must follow a cohort through time
the number of occurrences at one particular time
Prevalence- Obtained from cross-sectional studies
No time line, only a snap shot
Relationship between incidence and prevalence
slide 33
Allows the researcher to explore the relationship between two continuous variables
Regression analysis
A method of predicting change in the dependent variable by changing one or more independent variables
What % of variation in the dependent variable can be explained by a change in the independent variable
Regression analysis
Two broad data types
Categorical
Continuous
Categorical
Nominal
Ordinal
Continuous
Interval
Ration
named categories with no implied order
Nominal
sequenced or ranked data
Ordinal
E.g., smallest to largest, lightest to heaviest, easiest to most difficult
intervals along the scale are equal to one another (i.e. integers)
Interval
Continuous Data
characterized by the presence of absolute zero on the scale
Most precise
Ratio
Continuous Data
In ____ screening, a less expensive, less invasive, or less uncomfortable test is generally performed first, and those who screen positive are recalled for further testing with a more expensive, more invasive, or more uncomfortable test, which may have greater sensitivity and specificity. It is hoped that bringing back for further testing only those who screen positive will reduce the problem of false positives.
sequential or two-stage
Summarizes the same kind of information sensitivity and specificity and can be used to calculate the probability of disease in a low prevalence setting
Likelihood ratio (LR)
provides indication of the test’s discriminatory power
Predictive values are lower with a low prevalence
can be defined for the entire range of test result values
Likelihood ratio (LR)
is the proportion of diseased people with a negative test result (1-sensitivity) divided by the proportion of non-diseased people with negative test results (specificity)
negative LR (LR-)
How good the test is at “Ruling out” disease
The smaller the better (Desirable: 0.2 or less)
Loss in net specificity, and gain in net specificity
Sequential (Two-stage) Testing
Net gain in sensitivity, and net loss in specificity.
Patient is considered positive if they test positive on any test or both.
Patient is considered negative if they test negative on all the tests performed
Simultaneous Testing
one of the most common ways to examine relationships between two or more categorical variables
Chi-square
tests the null hypothesis that the variable are independent of each other, that there is no relationship between the two variable
chi-square of independence
does not give any information about the strength of the relationship.
Chi-square statistic
Computed the same way as the chi-square test for independence, but instead tests the hypothesis that the distribution of some variable is the same in all populations
Chi-square test for equality of proportion
Is used to test they hypothesis that the distribution of a categorical variable within a population followed a specific pattern of proportion
Chi-square test of goodness of fit
Fisher’s exact test
A non-parametric test similar, similar to the chi-square tests, but can be used with small or sparsely distributed data sets
a type of chi-square test used when the data comes from paired samples.
McNemar’s Test for Matched Pairs
Measures the strength of association between an exposure and disease
the effect of one intervention v another
= (AD)/(BC)
Odds ratio (OR)
If exposure does not affect (either cause or protect from) disease, the OR is ____
If the exposure is ____ to the disease, the OR > 1
If the exposure is ___ against the disease, the OR < 1
~ 1
related
protective
to be able to estimate the probability of an outcome associated with a dichotomous response for a single or multiple variables
Logistic Regression
a single outcome (or set of outcomes) from an experiment
Event
The proportion of subjects in a study group in whom the event is observed. Usually seen as a %.
Rate
A measure of how often a particular event (such as response to a drug, adverse event or death) occurs within the scientific control group of an experiment
Control Event Rate (CER) %
A measure of how often a particular event (such as response to a drug, adverse event or death) occurs within the experimental group of an experiment
Experimental Event Rate (EER) %
Basic risk statements express the likelihood that a particular event will occur within a particular population
Identifies what in our environment can lead to beneficial or adverse medical outcomes
Relative risk
SAME AS RISK RATIO
measures the magnitude of an association between an exposed and non-exposed (control) group.
calculated using cumulative incidence data to measure the probability of developing disease
Relative risk (same as Risk Ratio)
Must have incidence information to calculate
Cohort or clinical trials are conducted over time
The percentage difference in outcome between control (C) and experimental (E) groups
Relative risk reduction
RRR= (CER-EER)/CER
Not a good way to compare outcomes
Does not report the baseline risk of outcome
Measures such as percent reduction in mortality, is selected because it gives a more optimistic view of the effectiveness of a preventive measure.
Makes insignificant findings appear significant
Relative risk reduction
The actual reduction in events in the treated group (EER)
The arithmetic difference in outcomes between treatment and control groups
The “true difference” between the experimental and control intervention
Absolute risk reduction
ARR = CER - EER
Odds ratio are an ___ estimate of risk,
indirect
not a direct measure of risk
In a case-control study, only the odds ratio can be calculated as a measure of association ,whereas in a ___ study, either the relative risk or odds ratio can be calculated.
cohort
Odds ratios calculated in a case-control study are a good approximation of relative risk in the population when the following conditions are met:
When cases studied are representative, with regard to history of exposure, of all people with the disease in the population from which the cases were drawn.
When the controls studied are representative, with regard to history of exposure, of all people without the disease in the population from which the controls were drawn.
When the disease being studied does not occur frequently.
The number of patients who need to receive the new intervention instead of the standard alternative in order for one additional patient to benefit
Number needed to treat
Expresses the likelihood of the treatment to benefit an individual patient
There is NO absolute value for NNT that defines whether something is effective or not.
NNTs for treatments are usually low because we expect large effects in small numbers of people
Number needed to treat
NNTs for very effective treatments are usually in the range of 2 to 4
Rule of thumb:
NNT 10 or less for therapy
NNT 20 or less for prevention
Larger NNTs can be found useful where few patients are affected in large populations
Use for prophylactic measures
Example: Aspirin prevents one death at five weeks after a myocardial infarction, NNT of 40
When an experimental treatment is detrimental, the term number needed to harm (NNH) is often used.
The equations and approach are similar to those described above, except that NNH will have a negative absolute risk reduction
Number needed to harm
Generally is used to analyze continuous data
Compares the means and standard deviations of two populations
Data be must be normally distributed
Computes a p-value to test the null hypothesis
T-test
Assesses whether a difference between two groups’ averages is unlikely to have occurred because of random chance in sample selection. A difference is more likely to be meaningful and “real” if-
The difference between the averages is large.
The sample size is large.
Responses are consistently close to the average values and not widely spread out (the standard deviation is low).
T-test
Independent variable
the variable you’re interested in
Prevalence
cross-sectional
prevalance = disease burden