Biostatistics Flashcards
What is biostatistics/ the purpose
The collection and analysis of data ( so statistics), except specifically related to understanding the effects of a drug or medical procedure on people and animals
Its used to understand medical and pharmacy journals and helps us be able to answer clinical questions from patients and providers. Ex: on a question we should be able to determine if a drug is appropriate for a patient based on if they meet the exclusion criteria for a study ex: consider relative risk
What is a study manuscript
A description of the research completed with the results
What is peer review
When a researcher sends their manuscript to a journal and the editor sends it to experts in the field to be reviewed to assess the research design, the methods, the value of the results, the conclusion, how well it’s written, and whether it is appropriate/fitting for the journal. Reviewers decide whether to accept (usually with revisions or to reject it.
List the steps to publication
Research Question
Design the Study
Enroll the Subjects
Collect the Data
Analyze the Data
Publish the Data
What is continuous data + two types
Data (usually numerical) that has a logical order with values that continuously increase or decrease by the same/a measurable amount
Two types are ratio data and interval data
What is ratio data
continuous type of data with an equal difference between the values and there IS a meaningful zero. ex: age, height, BP, weight - ex: zero blood pressure is meaningful because the pt would be dead
What is interval data
continuous type of data with equal difference between values but there is NO MEANINGFUL ZERO
ex: celsius and farenheit scales - the zero temp doesn’t mean no temperature, but it’s not meaningful because it just means its cold
What is discrete data and the two types
Categorical data
Two types: nominal and ordinal data
What is nominal data
It’s yes/no data. Data that goes into arbitrary categories (names) like male vs female, ethnicity, marital status, mortality
What is ordinal data
It is ranked and in logical order such as a pain scale NYHA Functional class but the categories do not increase by the same amount. (pain of 4 is higher than 2, but that doesn’t mean it is twice the amount)
What are the measures of central tendency and when are they preferred for which data types
Mean (preferred for continuous data that is normally distributed)
Median (preferred for ordinal data or continuous data that is skewed)
Mode (nominal)
Describe standard deviation
how spread out the data is away from the mean SD+/- a certain amount from the mean.
68% of the data will fall between 1 SD of the mean
95% of the data will fall between 2 SD of the mean
99.7% of the data will fall between 3 SD of the mean
What is the range
The highest value - the lowest value
What is the mode
The value that occurs most frequently
What is a gaussian or “normal” distribution vs skewed data
It’s a bell curve that is normal and usually seen in continuous data with large sample sizes. The curve is symmetrical.
68% of the values fall within 1 SD of the mean and 95% of the values fall within 2 SD of the mean. You can use mean** or median or mode to describe your middle.
You lack normal distribution or have “skewed data” when the sample size is small or there are outliers in the data - when there is a small number of values, the outlier has a large impact on the mean. In these cases the median** is a better indicator of central tendency. Wherever the outliers are the graph will skew to that direction. Median is used to describe the middle for ordinal data too.
Negative skew = left skew
Positive skew = right skew
(skew refers to the tail of the data not the hump)
Distortion of central tendency can be fixed by collecting more values
independent variable
Changed /manipulated by researcher
dependent variable
outcome
Null hypothesis
states that there is no statistically significant difference between groups. It’s described as H0 or “Hnot”
- The null hypothesis is what the researcher tries to disprove and the alternative hypothesis (Ha) is what they’ve made up, are testing, and trying to prove is acceptable as true.
what is an alpha level and what does it mean in relation to the p value
represents the maximum error margin aka the tails on a normally distributed bell curve - The alpha level is determined by the investigator and usually set at 5% or 0.05 and it can be lower (which is better) but that just requires more subjects, more treatment effect, and more data - aka more money.
If the alpha level is set at 5% and the p-value is actually < 0.05, this means that we can reject the null because the data is deemed statistically significant and the alternate hypothesis can be accepted
If the p value is greater than or equal to alpha (p>/= 0.05) the study failed to reject the null hypothesis and its not statistically significant
If alternative hypothesis is accepted, phrasing that can be used:
This means you reject the null hypothesis or fail to accept it
If null hypothesis is accepted, phrasing that can be used:
This means you accept the null hypothesis or fail to reject it
The alpha level correlates with the values in the tails of a normally distributed graph
If 95% of values are within 2 SD of the mean, this correlates with an alpha of 5%
If 99.7% of values are within 3 SD of the mean, this correlates with an alpha of 1%
Formula for confidence interval (CI)
CI = 1- alpha
CI can be expressed as a range of values (ex: 95% CI 6%-34%)
This means that you are 95% confident that the true value of ____ for the general population lies somewhere between 6-35%. The more narrow the range, the more precise and the wider, the less precise
if alpha is greater than or equal to 0.05, what does this mean about the p-value and the significance and CI
that means p value is also greater than 0.05, CI is < 95% and its not statistically significant
if alpha is equal to 0.05, what does this mean about the p-value and the significance and the CI
p value is < 0.05 and the CI is 95% so the conclusion is correct and statistically significant, and there is less than 5% chance it’s not
if alpha is greater than or equal to 0.01, what does this mean about the p-value and the significance and the CI
p value is < 0.01 and the CI is greater than 99%, so its statistically significant and the conclusion is correct. There is less than 1% chance that it is not correct.
if alpha is greater than or equal to 0.001, what does this mean about the p-value and the significance and the CI
p value is < 0.001 and the CI is greater than 99.9% so this is statistically significant, the conclusion is correct, and there is less than 0.01% that it’s not.
if on the exam they don’t provide p value but they need to you to determine if something is stat significant, how would you do it? **
The result is stat significant if the 95% CI range does not include zero (with difference data , ex: difference in FEV1 between roflumilast and placebo)
Key: look for subtraction of values or the word “difference”
ex: Difference (95% CI) = 38 (18-58) vs 0.313 (-0.26-0.89)
When comparing ratio data, how do we determine if the value is stat significant if we don’t have a pvalue
We need to check if the 95% CI range includes 1. If it does, that means it’s not stat sig. if it doesn’t that means it’s stat sig. If it includes one that means the values are too similar.
This is true no matter if the CI is relative risk, odds ratio, hazard ratio, or just difference.
Type 1 error meaning and equation to determine type 1 risk
When the null hypothesis was rejected in error (and alt. hypothesis accepted) when it should have been accepted (and alt. hypothesis rejected). When there results said there was a difference between the values and there actually wasn’t
The probability of making a type 1 error is the same as the alpha value
a
ex: alpha is 0.05, study result is reported with p<0.05, so it is stat. sig. and the probability of a type 1 error is <5%
Type two error and how do we know the probability of making a type two error
When the null hypothesis is accepted in error, when it should have been rejected. When there is a diff. between the two groups but the stats made it seem like there was not.
Beta is the probability of making a type 2 error.
Beta is set by investigators during study just like alpha is. usually set at 0.1 or 0.2 (means the risk of type II error is 10% or 20%)
The risk of type two errors increases if the sample size is too small, but we can decrease the risk by using a power analysis to determine the sample size needed to detect a true difference between groups. Power is the probability of avoiding a type two error.
What is study power
The probability of avoiding a type two error aka the probability that a test will reject the null hypothesis correctly
Power= 1-Beta (which is the probability of a type two error)
The higher the power, the higher the risk of type 2 error.
Power is determined by the number of outcome values collected, the difference in outcome rates between the groups, and the significance (alpha level)
ex: if the beta is 0.2% , then there is a 20% chance of missing a true difference and making a type 2 error and the study power shows there is an 80% chance of avoiding type 2 error. Similarly, if beta is 0.1, there is 10% of type 2 error/accepting null in error, and 90% of avoiding that mistake. We can always decrease the beta and increase the study power by increasing the sample size
What is risk
the probability of an event occurring when a drug is given (can also determine risk of no intervention/placebo)
Risk = number of subjects with unfavorable event/ total number of subjects in group
What is the relative risk equation, and how do you interpret the value
the ratio of
risk in the exposed group/risk in the control group =% or decimal.
The answer means that the patients treated with the drug (or independent variable) are are ** ___ % ** AS LIKELY** to have progression of disease as placebo treated patients
RR of 1 or 100% means there is no difference in risk of outcome between the placebo and the intervention group so the intervention had no effect
RR of > 1 or 100% means the intervention group has a higher risk of outcome than the placebo so increased risk of the endpoint
RR of < 1 or 100% means that the intervention group has a lower risk of outcome than the placebo or intervention group so decreased risk of the end point
ex: 50% or 0.5 RR means there is still a 50% lesser chance of the intervention group causing harm than the control. 1.5% means there is 50% higher chance the intervention group has at causing the outcome than the control
What is RRR (relative risk reduction) and how is it used
We calculate relative risk reduction in order to understand further (after calculating risk and relative risk) how much the risk of the outcome in reduced in the treatment group compared to the control group
RRR = (% risk in control group - % risk in treatment group)/(% risk in control group) can use percent or decimal
OR RRR= 1- RR (decimal form only)
The answer tells us that the treatment group is ___% LESS LIKELY to have the determined unwanted outcome
Relationship between relative risk and relative risk reduction
RR = treatment “as likely” to cause the unwanted outcome as control
RRR = treatment “less likely” to cause unwanted outcome
RR+RRR = 100%
What are two key reportable points we want to make sure we see in a study as clinicans
RRR and ARR because they give us a more clear picture on the reduction risk but also the incidence rate (if the reduction risk is high, but we have a low absolute risk reduction, it means that the true value of the drug in real life patients is minimal
ARR
absolute risk reduction:
= %risk in control group - % risk in treatment group
because its expressed as a percentage it can be viewed as out of 100.
Means that “for every 100 patients, _____ fewer patients wi
What clinical question can be answered by using number needed to treat vs. number needed to harm and what is the formula for both
“How many patients need to receive the drug for one patient to get benefit (NNT) or for one person to experience harm (NNH)?” This helps us understand the patients individual risk of taking the drug.
NNT= number of patients who need to be treated over a certain period over time (e.g. length of the study) in order for one patient to see a benefit (e.g. avoid HF progression)
NNT = 1/ (%risk in control - %risk in treatment) aka 1/ARR (decimal). Always round up for NNT no matter if 9.1 or 9.9 in order to avoid overestimating benefit
NNH = number of patients who need to be treated over certain period of time (e.g. length of study) in order for one patient to be harmed. (same formula as NNT)
always round down so that we don’t underestimate the risk of using the drug no matter if 9.9 or 9.1.
What does an odds ratio tell us
the odds of an unfavorable event associated with a treatment or intervention in case control studies (because in case controls, you cannot use the relative risk calculation) mostly, but also in cohort and cross sectional studies
it calculates the odds of an outcome occurring with an exposure compared to it occurring without the exposure
the resulting percentage tells us that OR= 1.23, means that the drug/exposure has a 23% increased risk at causing the unfavorable event than the no drug/no exposure
what is a case control study
study that enrolls patients who already have a clinical outcome or disease. The patients medical charts are reviewed retrospectively to look for possible exposures that increased the risk of them getting that disease or outcome
Odds ratio formula
OR = (exposure present & outcome present) * exposure absent & outcome absent)
/ (exposure absent & outcome present) * (exposure present & outcome absent)
remember exposure and outcome and then remember numerator is present present and absent absent and denominator is present absent vs absent present
what is the hazard rate and hazard ratio and how are they used
in an analysis of death or disease progression (survival analysis) we use hazard rate instead of “risk” because the stakes are higher so the terminology is more intense.
Hazard ratio is the rate of an unfavorable event occurs within a short period of time. Its the same formula as relative risk (RR)
Hazard rate = number of unfavorable events in group/total members of group
Hazard ratio = hazard rate in treatment group/hazard rate in control group
how do we interpret the OR and the HR (odds ratio and hazard ratio)
OR or HR = 1; this means that there is no difference in rate of unfavorable outcome/primary endpoint between the treatment vs control group. Ex: OR or HR of 1 means there is an equal amount of death occurring in treatment and control group
OR or HR < 1; this means that there is a lower rate of the unfavorable event in treatment group than control group. ex: HR or OR of 0.5 means there are 50% less deaths in the treatment group as the control group
OR or HR >1; this means there is a higher rate of unfavorable event (outcome/primary endpoint) in the treatment vs control group. ex: OR or HR of 2 so there are 2x as many deaths that occur with the treatment than with control