Biostatistics Flashcards
What is statistics?
Statistics involves the collection and analysis of all types of data
What is biostatistical analysis or biostatistics?
When statistics are used to understand the effects of a drug or medical procedure on people and animals, the statistical analysis is called biostatistical analysis or biostatistics
What is the path to publication for the classic type of research study?
Begin with a research question, design the study, enroll the subjects, collect the data, analyze the data and publish
What happens after a study manuscript is published?
A study manuscript can be submitted for publication in a professional, peer-reviewed journal. The editor of the journal selects potential publications and sends them to experts in the topic area for peer review
What is the intention of peer review?
Peer review is intended to assess the research design and methods, the value of the results and conclusions to the field of study, how well the manuscript is written and whether it is appropriate for the readership of the journal
What is the potential impact of a peer reviewed study manuscript?
The reviewers make a recommendation to the editor to either accept the article (usually with revisions) or reject it. Data that contradicts a previous recommendation, or presents new information, can change treatment guidelines
Describe the organization of a published clinical trial
A published clinical trial begins with an abstract that provides a brief summary of the article. The introduction to the study comes next, which includes background information, such as disease history and prevalence, and the research hypothesis. This is followed by the study methods, which describe the variables and outcomes, and the statistical methods used to analyze the data. The results section includes figures, tables and graphs. A reader needs to interpret basic statistics and common graphs in order to understand the study results. The researchers conclude the article with an interpretation of the results and the implications for current practice
What is continuous data?
Continuous data has a logical order with values that continuously increase (or decrease) by the same amount. Data is provided by some type of measurement which has unlimited options (theoretically) of continuous values
What is the two types of continuous data and what is the difference between them?
The two types of continuous data are interval data and ratio data. The difference between them is that interval data has no meaningful zero (zero does not equal none) and ratio data has a meaningful zero (zero equals none)
What is discrete (categorical) data and what are the two types?
The two types of discrete data, nominal and ordinal, have categories, and are sometimes called categorical data. Data fits into a limited number of categories
What is nominal data?
With nominal data, subjects are sorted into arbitrary categories and order of categories does not matter
What is ordinal data?
Ordinal data is ranked and has a logical order. Ordinal scale categories do not increase by the same amount
What are descriptive statistics and what are the typical descriptive values?
Descriptive statistics provide simple summaries of the data. The typical descriptive values are called the measures of central tendency, and include the mean, the median and the mode
What is the mean?
The mean is the average value and is calculated by adding up the values and dividing the sum by the number of values. The mean is preferred for continuous data that is normally distributed
What is the median?
The median is the value in the middle when the values are arranged from lowest to highest. When there are two center values (as with an even number of values), take the average of the two center values. The median is preferred for ordinal data or continuous data that is skewed (not normally distributed)
What is the mode?
The mode is the value that occurs most frequently. The mode is preferred for nominal data
What are the two common methods of describing the variability?
Range and standard deviation
What is the range?
The range is the difference between the highest and lowest values
What is the standard deviation?
Standard deviation indicates how spread out the data is, and to what degree the data is dispersed away from the mean. A large number of data values close to the mean has a smaller SD, Data that is highly dispersed has a larger SD.
What do large sample sets of continuous data tend to form?
A Gaussian or “normal” (bell-shaped) distribution
Describe the curve of the Gaussian distribution when the distribution of data is normal.
When the distribution of data is normal, the curve is symmetrical (even on both sides), with most of the values closer to the middle. Half of the values are one the left side of the curve, and half of the values are on the right side. A small number of values are in the tails.
Describe the data when it is normally distributed with the Gaussian distribution?
- The mean, median and mode are the same value and are at the center point of the curve
- 68% of the values fall within 1 SD of the mean and 95% of the values fall within 2 SDs of the mean
Describe how the curve of normally distributed data changes based on the spread (or range) of the data
The curve gets taller and skinnier as the range of data narrows. The curve gets shorter and wider as the range of data widens (or is more spread out).
What happens when the data is skewed?
Data that are skewed do not have the characteristics of a normal distribution; the curve is not symmetrical, 68% of the values do not fall within 1 SD from the mean and the mean, median and mode are not the same value
*This usually occurs when the number of values (sample size) is small and/or there are outliers in the data
What is an outlier?
An outlier is an extreme value, either very low or very high, compared to the norm
*When there are a small number of values, an outlier has a large impact on the mean and the data becomes skewed
What is the best measure of central tendency when there are outliers?
In this case, the median is a better measure of central tendency
Describe what it means when the data is skewed to the left or the right.
Data is skewed towards outliers. When there are more low values in a data set and the outliers are the high values, data is skewed to the right (positive skew). When there are more high values in the data set and the outliers are the low values, the data is skewed to the left (negative skew)
What is a variable?
A variable in a study is any data point or characteristic that can be measured or counted
What is an independent variable and dependent variable?
An independent variable is changed (manipulated) by the researcher in order to determine whether it has an effect on the dependent variable
How does a trial show that a product is significantly better than the current treatment or placebo?
To show significance, the trial needs to demonstrate that the null hypothesis is not true and should be rejected, and the alternative hypothesis can be accepted
What is a null hypothesis?
A null hypothesis states that there is no statistically significant difference between groups. The null hypothesis is what the researcher tries to disprove or reject
What is the alternative hypothesis?
The alternative hypothesis states that there is a statistically significant difference between the groups. The alternative hypothesis is what the researcher hopes to prove or accept
What is the alpha level?
Alpha is the maximum permissible error margin. Alpha is the threshold for rejecting the null hypothesis
What is the most common alpha in medical research?
In medical research, alpha is commonly set at 5% (or 0.05)
If a smaller alpha value is chosen, what is required?
A smaller alpha vale can be chosen, but this requires more data, more subjects (which means more expense) and/or larger treatment effect
What is the p-value and how is it important?
The p-value is compared to alpha. If alpha is set at 0.05, and the p-value is less than 0.05, the null hypothesis is rejected, and the result is termed statistically significant. If the p-value is greater than or equal to alpha (P>0.05), the study has failed to reject the null hypothesis, and the result is not statistically significant
What is a confidence interval?
A confidence interval (CI) provides the same information about significance as the p-value, plus the precision of the result
What is the relationship between confidence interval and alpha?
CI = 1 - a
*If alpha is 0.05, the study reports 95% CIs, an alpha of 0.01 corresponds to a CI of 99%
Describe how the values in the CI range are used to determine whether significance has been reached.
When comparing difference data, the result is statistically significant if the CI range does not include 0. When comparing ratio data, the result is statistically significant if the CI range does not include one
What can a confidence interval tell?
A CI indicates that you are 95% confident that the true value lies somewhere within the range given
What do narrow and wide CI ranges imply?
A narrow CI range implies high precision and a wide CI range implies poor precision
Describe type I errors
Type I errors are false positives where the alternative hypothesis was accepted and the null hypothesis was rejected in error
How is the probability or risk of making a type I error determined?
The probability, or risk, of making a type I error is determined by alpha and it relates to the confidence interval
*When alpha is 0.05 and a study result is reported with p<0.05, it is statistically significant and the probability of a type I error is <5%
Describe type II errors
Type II errors (denoted as beta) are false negatives where the null hypothesis is accepted when it should have been rejected
*Beta is set by the investigators during the design of a study and it is typically set at 0.1 or 0.2, meaning the risk of type II errors is 10% ad 20%
When does the risk of type II errors increase and how can you decrease this risk?
The risk of type II error increases if the sample size is too small. To decrease this risk, a power analysis is performed to determine the sample size needed to detect a true difference between groups
What is power and how is it calculated?
Power is the probability that a test will reject the null hypothesis correctly
Power = 1 - B
What does it mean when power increases?
As the power increases, the chance of a type II error decreases
How is power determined?
Power is determined by the number of outcome values collected, the difference in outcome rates between the groups and the significance (alpha) level
What is risk?
Risk refers to the probability of an event (how likely it is to occur) when an intervention is given and the lack of intervention is measured as the effect in the placebo (or control) group
What is relative risk?
The relative risk (RR) is the ratio of risk in the exposed group (treatment) divided by risk in the control group
How is risk calculated?
Risk = number of subjects in group with an unfavorable event/total number of subjects in group
How is relative risk calculated?
RR = risk in treatment group/risk in control group
What does it mean when RR = 1 (or 100%)?
Implies no difference in risk of the outcome between the groups
What does it mean when RR > 1 (or 100%)?
Implies greater risk of the outcome in the treatment group
What does it mean when RR < 1 (or 100%)?
Implies lower risk (reduced risk) of the outcome in the treatment group
What does the relative risk calculation determine?
The RR calculation determines whether there is less risk (RR < 1) or more risk (RR > 1)
What does the relative risk reduction (RRR) indicate?
The relative risk reduction (RRR) is calculated after the RR and indicates how much the risk is reduced in the treatment group compared to the control group
What is the RRR formula?
RRR = (% risk in control group - % risk in treatment group)/%risk in the control group
1-RR (must use decimal form of RR)
What does the RR and RRR provide?
The RR and RRR provide relative (proportional) differences in risk between the treatment group and the control group (no meaning in terms of absolute risk)
Why is absolute risk reduction more useful than RR and RRR?
Absolute risk reduction is more useful because it includes the reduction in risk and the incidence rate of the outcome
What is the ARR formula?
ARR = (% risk in control group) - (% risk in treatment group)
What is the additional benefit of calculating the ARR?
An additional benefit of calculating the ARR is to be able to use the inverse of theARR to determine the number needed to treat (NNT) and number needed to harm (NNH)
What is number needed to treat (NNT)?
NNT is the number of patients who need to be treated for a certain period of time in order for one patient to benefit
What is the NNT formula?
NNT = 1/(risk in control group - risk in treatment group)
NNT = 1/ARR
What is number needed to harm (NNH)?
NNH is the number of patients who need to be treated for a certain period of time in order for one patient to experience harm
*NNH is calculated with the same formula as NNT
What are the rounding rules for NNT and NNH?
For NNT, anything greater than a whole number, round up to the next whole number (avoids overstating the potential benefit of an intervention). For NNH, anything greater than a whole number, round down to the nearest whole number (avoids understating the potential harm of an intervention
*The absolute value of the ARR is used with NNH
What is the definition of odds?
Odds are the probability that an event will occur versus the probability that it will not occur
What is the odds ratio and when is it used?
In case control studies, the odds ratio is used to estimate the risk of unfavorable events associated with a treatment or intervention
How are case control studies designed?
Case control studies enroll patients who have a clinical outcome or disease that has already occurred. The patient medical charts are reviewed retrospectively to search for possible exposures that increased the risk of the clinical outcome or disease
How is the odds ratio used in case control studies?
The odds ratio (OR) is used to calculate the odds of an outcome occurring with an exposure, compared to the odds of the outcome occurring without the exposure
What is the OR formula?
OR = (number that have the outcome with exposure x number without the outcome without exposure)/(number without the outcome with exposure x number that have the outcome without the exposure)
What is a hazard rate and when is it used?
In a survival analysis, a hazard rate is used. A hazard rate is the rate at which an unfavorable event occurs within a short period of time
What is the hazard ratio?
The hazard ratio (HR) is the ratio between the hazard rate in the treatment group and the hazard rate in the control group
What is the HR formula?
HR = hazard rate in the treatment group/hazard rate in the control group
What does it mean when OR or HR = 1?
The event rate is the same in the treatment and control arms. There is no advantage to treatment
What does it mean when OR or HR > 1?
The event rate in the treatment group is higher than the event rate in the control group