Biostats All Flashcards
observation
aka record is a row in a table of data. It represents one person
variable
is a column in a table of data. It contains information about one characteristic of the person (race/gender/DOB)
quantitative/continuous variables examples and definitions
- ratio-scale: is an interval variable with a true zero point (height, BP, duration of illness,#of children)
- interval: value on a scale of equally spaced units with no true zero point (DOB, temperature)
qualitative/categorical variables examples and definitions
- nominal : values with no numerical ranking (residence). These can be dichotomous variables (alive/dead, smoker/non smoker)
- ordinal: has values that can be ranked but are not evenly spaced (stage of cancer, education level, BMI)
properties of frequency distributions are
- central location (where the distribution has its peak)
- spread (how widely it is dispersed on both sides of the peak)
- shape (is it symmetrical on both sides of the peak)
how do you describe the central location
mean
median
mode
how do you describe spread
range
interquartile range
standard deviation
when is a graph positively skewed
when its central location is to the left and its tail is to the right (aka graph is skewed to the right)
what is the IQR
it represents the central portion of distribution, from the 25 to the 75 percentile
how to calculate standard deviation
Calculate the arithmetic mean.
Subtract the mean from each observation.
Square the difference. Sum the squared differences.
Divide the sum of the squared differences by n–1.
Take the square root of the value obtained.
The result is the standard deviation.
define range
The range of a set of data is the difference between its
largest (maximum) value and its smallest (minimum) value.
define probability
measure of likeliness that an event occurs
define odds
ratio of the probablity of having an event to the probability of not having an event (P/1-P)
relationship between probability and odds
probability and odds are more alike the lower the absolute P (risk)
how to calculate risk and odds from a table
risk: event/all events
odds: event/non events
proportion
a ratio in which the denominator includes the numerator
ratio
is a number that expresses the relative size of two other numbers. Numerator is not in the denominator
rate
occurrance of events over a specific time interval. Or the measure of frequency of some phenomena of interest
prevalence
cases of a disease in a given pop at a specific time
incidence
# of new cases of a disease during a period/ healthy pop (at risk) at the beginning of the period - proportion of a pop to acquire the disease in a period of time
incidence rate
new cases / total person time of observation
prevalence tells you
probability of having the disease –> burden
incidence tells you
probability of developing the disease–> risk
risk ratio
risk in group 1 (group of interest) / risk in group 2 (comparison group)
rate ratio
compares the incidence rates or mortality rates of 2 groups.
in a case controlled study what can you measure
the odds ratio
in a prospective study (like cohort or randomized) what can you calculate
risk ratio, rate ratio, odds ratio
with IQR use
median
with standard deviation use
mean
standard deviation and variance
SD is the square root of variance
standard error of the mean is used to
calculate the confidence interval
list the hierarchy of evidence, from least to most
case report case series ecological studies cross sectional studies case controlled cohort randomised controlled
define clinical trial
a prospective study comparing the value of an intervention against a control. An investigator ASSIGNS which people get drug (treatment group) and which get placebo (comparison group)
simple randonmized trial
patients are randomised to two treatments without considering their charcteristics. It is simple, useful when prognostic factors are unknown
stratified randomnised design
when prognostic factors are known, and patients are grouped into prognostic categories. Within these groups patients are randomnly assigned treatments
cross over design. Advantages? When to use? Disadvantages?
here patients serve as their own controls. Give them treatment for 6 weeks then don’t give them treatment for 6 weeks. and compare.
- Use only for chronic diseases.
- ADV: good for comparing results
- DIS: Potential carryover effects of the drug
factorial design
used to ask two or more questions in the same clinical trial.
eg: 2 treatments are studied for their relationship to response and each is given at 2 levels.
what do you consider when taking a sample size
funding, ethics, eligibility criteria.
- must include an adequate number of
individuals
- consider the anticipated difference between the groups, the background rate of outcome, and probability of making some statistical errors.
- error type 1 and 2
- what is the smallest difference between treatments
- what is the variance
- Smaller anticipated differences between treatment and comparison groups require larger sample sizes
define randomization. Why would you do it? How can you randomise? Benefit of masking and blinding? Issues with follow up?
assigning or ordering things via a random process. To remove or reduce bias.
- coin toss, table of random numbers, stratified block randomization.
- dropouts
- lost to follow up
compliance, non compliance, why and consequenes
non compliance: failure to follow the requirements of the protocol
- reasons for it: toxic reactions to treatment, waning interest, desire to seek other therapies
- conse: smaller difference between treatment and comparison groups than truly exists
- how to prevent: simple regime of study to follow, enroll motivated people, make sure they are aware of things they are required to do, freqeuntly contact them throughout the study
define interim analysis
analysis comparing intervention groups at any time before the formal completion of the trial. Used to stop trial if patients are at unnecessary risk
intention to treat vs per-protocol analysis
ITT captures real life. Results use data from all subjects. Advantage: preserves randomization. Disadvantages: does not determine maximum potential effectiveness of a treatment
PPA: Results use data only from subjects who followed protocol. Advantages: evaluates maximum benefit of a treatment
intention to treat vs per-protocol analysis
ITT captures real life. Results use data from all subjects. Advantage: preserves randomization. Disadvantages: does not determine maximum potential effectiveness of a treatment
PPA: Results use data only from subjects who followed protocol. Advantages: evaluates maximum benefit of a treatment
main types of clinical trials
- prevention trials
- screening trials
- diagnostic trials
- treatment trials
- QOL trials
- compassionate use trials
what is phase 4 of clinical trials, and why do we do them
are called post marketing studies. are used to get more information (long term side effects-thalidomide)
difference between descriptive and analytical studies
descriptive: who/where/when
analytical: why - observational (cohort and case control) or experimental). Use descriptive studies to make a hypothesis and test it with analytical studies
ecological study (for pop) (a descriptive study)
examines rates of disease in relation to a factor developed on a pop level (an aggregate/enviromental/global measure).
Are quick, cheap and easy to understand.
cross sectional study (for individual) (a descriptive study)
take a snap shot of a pop at a point in time, measure disease prevalence in relation to exposure. For public health planning, etiological research.
Are cheap, generalised and cannot give temporal sequence.
how to do a cohort study. What would you calculate from this? When do you use it? How do you chose a cohort?
take a pop, sample people without the disease, find out who was exposed and not exposed, and look to see who got the disease and who didn’t.
- calculate measure of freqeuncy: incidence (risk), incidence rate, attack rate (outbreak)
- if exposure is associated to outcome, to estimate risk of outcome in exposed and unexposed cohort
- be alive, be at risk of outcome, be free of outcome at the start of study
types of cohort studies. Absolute measures and relative measures in cohort studies? Why are rate and risk ratio sometimes different? Which one do we trust?
prospective (study starts before disease occurance), retrospective (study starts after disease occurance), combination
- absolute: incidence difference
- relative: rate ratio, risk ratio
- they differ if the follow up times are not equal, between the 2 groups. HERE WE DONT TRUST RISK RATIO, USE RATE RATIO
advantages of a cohort study. Disadvantages
ADV: temporal relationship can be inferred, can directly measure disease incidence, can examine rare exposure, multiple outcomes can be studied, less vulnerable to bias
- DIS: long, expensive, inefficient for rare outcomes, multiple exposures are difficult to asses, not suitable for diseases with long latency, exposure change
how to perform a case control study. WHEN to use? ADV? DIS? What can you calculate here?
choose cases and then controls (from pop which gave rise to case), find out who was exposed and who wasn’t (questionnaire to find frequency of exposure). These are retrospective.
- WHEN: when exposure data are expensive, disease with long latent period, rare disease, little known about disease, pop is dynamic
- ADV: cheap, easy, quick, multiple exposure can be examined, rare/long latency can be seen
- DIS: bias, direct incidence estimation no possible, temporal relationship unclear, multiple outcomes cannot be studied, inefficient for rare exposure.
- Odds ratio (ad/bc)
Types of case controlled studies
depend on how you select controls
- general pop controls
- hospital controls
- special control (friends, spouse, siblings)
if a measure of association (risk/odds ratio) is >1 it means? <1?
- we have + association
- inverse/protective association
- =1 : no / neutral association
statistical inference
when you measure properties of a sample (mean and SD) and use these values to infer the properties of the entire pop
steps for hypothesis testing
1- null and alternative hypothesis 2- calculate test statistic 3- specify significance level 4- determine P value 5- make statistical inference
if p value is >5% what do we do
accept null hypothesis
alpha and beta errors and power of study
alpha: false positive results occur (5% aka significance level)
beta: probability of a false negative result (10-20%)
power of study: 1-B (80-90%)
how to choose a test based on study (for continuous variables - height/weight/BMI)
- one sample t test: tests the null hypothesis that the mean of a pop is equal to a constant value (non parametric version is sign test)
- two sample t test: compare treatment outcome of 2 samples (if non parametric use Mann Whitney test)
- paired t test: compare 2 non independent samples (if not parametric use wilcoxon signed rank test)
(1- compare group with constant value. 2- compare means between 2 groups)
tests to use for categorical data (a comparison of proportions- mortality rates)
use chi squared test
correlation. What tests to use?
explores the association between two variables that are continuous.. Can be +/-/strong/weak.
- if data follows norm disribution: Pearsons correlation
- if data is not normal: spearmans rank correlation
They take values from -1 to +1
confidence interval
gives a measure of the precision of the result from a sample. 95% CI gives the range of values which we can be 95% confident includes the true value
Probability (p value) only measure strength of evidence against he null hypothesis
bias
any systematic error in the design or the conduct of an epidemiological study resulting in a conclusion which is different from the truth
random error
reflects the amount of variability
main types of bias
- selection bias (healthy worker effect-a type of bias/error where the researches choose who is included in a study so results
may not be applicable to a population outside of the study) - information bias (recall bias-measure bias, happens when researchers are unable to collect accurate data)
- confounding (when the effects of two exposures have not been separated and the analysis concludes that the effect is due to one variable rather than the other
how to fix bias
- randomization
- restriction
- matching: people in the case group and control group are matched based on characteristics
- stratification: is the process of dividing members of the population into groups before sampling
- statistical modeling
how to calculate sensitivity
has a high probability of detecting the disease (True Positive / (True Positive+ False Negative)
how to calculate specificity
has high probability that those without the disease are eliminated (TN/TN+FP)
positive and negative predictive values
Positive: how likely is someone with a positive test result to actually have the characteristic? (TP/TP+FP)
Negative: How likely is someone with a negative test result to actually not have the characteristic (TN/TN+FN)
note that sensitivity and specificity are characteristics of a test and they do not change, however positive and negative predictive values changes according to the prevalence of the disease
how do we use GRADE
it is for rating the quality of a body of evidence
odds ratio
- In Cohort: is odds of a disease in exposed group vs odds of a disease in unexposed group (looking at occurrence)
- In Case control: odds that the cases were exposed/odds that controls were exposed (looking at exposure)
relative risk in cohort studies is
risk of disease in exposed / risk of disease in unexposed
meta analysis
combines results from all studies to increase statistical ability to discern a small but significant clincal event
narrative review
conducted by experts. Has bias
systematic review
minimise bias. Conducted by researchers