drug lit exam 1 - bio stats Flashcards
variables:
Determine if a variable is nominal, ordinal, interval, or ratio
Recognize dichotomous endpoints
Variables
Variable:
- Anything that can be observed or measured in a clinical experiment
Dependent Variable:
- The outcome of interest
- What should change as a result of the researcher’s intervention
Independent Variable
- The researcher’s intervention
- What is being manipulated
Types of Variables
Discrete Data
- Can only be whole numbers
- Example: you can’t have 2.13 children
Continuous Data
- Can take any value, within a defined range
- Example: you can divide BP mmHg into tenths of a mmHg, hundredths, even thousandths!
Another way to think about variable types…
Nominal
- Different categories, in no particular order
Ordinal
- Ordered categories, where the distance between categories cannot be considered equal
Interval
- Equal distances between values, but the zero point is arbitrary (not the same for each variable)
Ratio
- Equal distances between values, with a meaningful zero point
Variable Examples
Nominal
- No category is “higher” or “better” than others
- Every study participant in a sample will be placed into one of the categories
Also referred to as dichotomous when there are 2 options
Examples
- Medical diagnoses (“Diabetes”; “No diabetes”)
- Race or Nationality (“Asian”; “African”; “European”)
- Age groups (“< 18 years”; “18-44 years”; “> 44 years”)
Variable Examples
Ordinal
- There is ordering of these values, but the distance between values is not equal
Examples:
- Excellent/Satisfactory/Unsatisfactory
- Likert Scales (strongly agree, agree, neutral, disagree, strongly disagree)
- Cancer Stages I – IV
- The order of finishing a race
Variable Examples
Interval
- Ordering of values and equal distance between values
- The zero point isn’t meaningful, and therefore can be changed
Example
- Temperature
Variable Examples
Ratio
- Ordering of values and equal distance between values
- The zero is meaningful
Examples
- Weight (kg/lbs)
- Height (cm/inches)
- Blood pressure (mm Hg)
Variable Type & Assumptions
Nominal
Named categories
Ordinal
Same as nominal plus ordered categories
Interval
Same as ordinal plus equal intervals
Ratio
Same as interval plus meaningful zero
Learning objective for descriptive stats
Given a mean and standard deviation of a normally distributed sample, calculate the range that 95% of the data points fall between
Two categories of statistics
1) Descriptive Statistics
- Used for presenting, organizing, summarizing data
- Can summarize your data set with just a few key numbers
What you need to know about:
Mean
Median
Mode
Interquartile range
Standard deviation
Two categories of statistics
2) Inferential Statistics
Used to generalize data from a sample to a larger population
Used to identify “statistically significant” differences
Examples
Student’s t-test
Chi Squared
ANOVA
Understanding when and how to use these statistics won’t be a focus for the biostatistics lectures in this course
Measures of Central Tendency
Mean:
The average value
Sum all values and divide by number of values (N)
AKA the “typical value”
Only okay to use to describe interval and ratio data!
Affected by outliers
If a study reports the mean for ordinal data, critique that as bad statistics!
median
The middle value
The 50th percentile
Arrange all values from smallest to largest and pick the middle number
Used to describe ordinal data (interval and ratio are okay too)
mode
The most frequently occurring value or category
Used to describe nominal data (interval, ordinal, and ratio are ok too)
Quick Quiz! Calculate the mean, median, and mode for this dataset
7, 4, 2, 4, 8
Mean = 7+4+2+4+8 = 25/5 (n) = 5
Median = middle value: 2, 4, 4, 7, 8 = 4
Mode = 4 (most frequently occurring)
Which data set has the largest mean?
none of the above
Measures of Dispersion
How closely the data cluster around the measure of central tendency
Range:
The difference between the highest and lowest value
Measures the variability of the data
Advantage: simple to calculate and understand
Disadvantage: affected by outliers
Interquartile Range
The interval between the 25th and 75th percentiles
The middle 50% of values
A measure of variability
Directly related to the median
Advantage: Not affected by outliers
Standard Deviation (SD)
Very common estimate of data variability
Estimates the scatter of data points about the sample mean
Often necessary when running inferential statistics
68% in 1 SD
95% in 2 SD
99% in 3 SD
statistically significant
p value less than 0.05
Learning objectives for p value alpha typ1 I error
Interpret a p value, with respect to the alpha of the study and risk of type I error
Given clinical study results, identify any statistically significant differences between groups
Clinical studies
Assume that an intervention (e.g., medication) has an effect
Clinical study is an experiment to see what the effect of the intervention is:
Based on the results of that one experiment, can we apply those results to the overall patient population??
Hypothesis testing:
Null hypothesis
Alternative hypothesis
Hypothesis Testing
Research Hypothesis (H1)
- The treatment (or intervention) has an effect on the experimental group
- Effect seen by comparing the experimental group to the control group
Null Hypothesis (H0)
- Any difference seen between the experimental and control groups is due to chance alone
- The intervention does not have an effect
- The hypothesis we are actually doing statistics on
The role of chance
Imagine you want to see if the true chance of getting heads on any single toss = 0.5 (50%)
Toss coin 10 times
Would you expect to see exactly 5 heads and 5 tails?
Type I error
Inappropriately concluding that there is a true difference between 2 study groups when the difference is due to chance alone
A “false positive”
Components of a study that are prone to type I error
Subgroup analysis (especially post hoc)
Secondary endpoints
Alpha (α)
The acceptable probability that the difference between study groups is due to chance alone
Accepted by researchers at the start of the study (before the experiment happens)
Usually 0.05 (5%) in clinical research
- The difference between study groups has less than a 5% probability of occurring due to chance alone
P value
The actual probability that the difference between 2 study groups is due to chance alone at the end of the study (after the experiment happens)
When P < alpha, the difference seen is statistically significant
- (The probability that the difference seen is completely random is < 5%)
Sometimes referred to as a positive study result
Learning objectives for beta type II error
Explain how various factors can affect the power of a study
Demonstrate understanding of the relationship of beta to power and type II error
Preclass online lecture Part 3: P values, alpha, and type I error
Focused on the importance of making sure that a study doesn’t inappropriately conclude that there is a true difference between study groups when the difference seen is due to chance alone
Remember the type of error?
- Type I error
What about an error when there IS a true difference between study groups but the study fails to detect that difference?
- Type II error
Type II error
The researchers aren’t able to find a difference between the experimental and control groups, but a difference really exists
Failing to show statistical significance, when there really is a true difference between study groups
Falsely concluding that any difference seen between study groups is due to chance alone
A “false negative” result
Beta (β)
The probability of making a type II error
Determined during study design phase
Can be anything, often between 5-20% (β = ___ - ___)
Study “power”
The chance that if a true difference exists, it will be successfully detected
The probability of not making a type II error
Power = 1 – β memorize this equation!
Determined during study design phase by doing a power calculation
What things can affect the power of a study?
Sample size (n)
- As sample size increases, power increases
The effect size/event rate
- As effect size or event rate becomes larger, power increases
The duration of the study
- As study duration increases, power increases
A type II error may occur
- If a study fails to enroll enough patients
- If the researchers overestimate the size of the treatment effect
- If the study is terminated early
Any of these things can result in a study being underpowered
Age (years)
A. Dichotomous – means two options
B. Continuous – can be a whole or decimal number
B. Continuous – can be a whole or decimal number
Experiencing one or more hospitalization(s)
A. Dichotomous – two options
B. Continuous – decimal or whole #
A. Dichotomous – two options
because the patient experienced one or more hospitalizations so the answer is yes or no which makes it dichotomous
Time (hours)
A. Dichotomous
B. Continuous
B. Continuous
Type of variable: Stage of cancer (0, 1, 2, 3, 4)
A. Nominal
B. Ordinal
C. Interval
D. Ratio
Ordinal - still categories, because it is in a particular order but do not have a set amount of spacing between the different stages
and the stages are not on a number line
also can be a class of heart failure or order of finishing a race
Type of variable: duration of diabetes (mean # years)
A. Nominal
B. Ordinal
C. Interval
D. Ratio
nominal or ordinal are categories so it is either they have diabetes or they do not
but this is the mean # years so we can rule out A or B
interval data there is an arbitrary zero in the data but the ratio and ratio has a true zero
so is the true zero mean that the duration of diabetes means you never had diabetes
answer: ratio, data is reported as SD or mean
if reported as an n then it is usually nominal or ordinal
Which data set has the largest mean?
A. A
B. B
C. C
D. A = B = C
D. A = B = C
Which data set has the largest standard deviation?
A. A
B. B
C. C
D. A = B = C
C. C
In the data set below, which is the most appropriate measure of dispersion? (Example: days of hospitalization in a sample of 8 patients) - which gives a better spread for the people in the data
1,2,2,3,3,4,5,90
A. Range
B. Interquartile range
C. Both are appropriate
B. Interquartile range
there is an outlier that we wan to get rid of so we use the IQR
Given the following data, what are the lower and upper fasting glucose values that 95% of the patient sample would fall between?
A. 80, 120
B. 90, 110
C. 97, 103
need mean +/- 2 SD for 95%
A. 80, 120
A study finding that is statistically significant will always be clinically significant - means something to the clinician that they can use
True
False
true
Which of the following endpoints were statistically significant (α = 0.05)?
A. Death
B. Cardiac arrest
C. Stroke
D. Hospitalization
E. Both 2 and 4
E. Both 2 and 4
because both of its p-values are less than 0.05
for death: it is not statistically significant because the p-value is greater than 0.05
for cardiac: the rhythm control group had a higher chance of having cardiac arrest, it is statistically significant because the p-value is less than 0.05
for stroke: the rhythm control group had a higher chance of having a stroke, it is not statistically significant because the p-value is greater than 0.05. there is a 79% probability that the difference we are seeing is due to chance alone
for Hospitalization: the rhythm control group had a higher chance of having a Hospitalization, it is statistically significant because the p-value is less than 0.05. there is a 0.001% probability that the difference we are seeing is due to chance alone
What is the “alpha” of a study?
A. Acceptable limit to the probability of making a type I error
B. Acceptable limit to the probability of making a type II error
C. Risk of false positive
D. Risk of false negative
E. Both 1 and 3
F. Both 2 and 3
E. Both 1 and 3
usual alpha: 0.05
relates to a type I error so that is why it is also a false positive
Which of the following describes a clinical situation consistent with type II error (false negative)?
A. A patient is diagnosed with cancer but does not really have cancer
B. A patient is diagnosed with cancer free, but really does have cancer
C. A patient is diagnosed with cancer and really does have cancer
D. A patient is diagnosed as cancer-free and does not really have cancer
B. A patient is diagnosed with cancer-free but really does have cancer
type II error: false negative so not finding something that is actually there
A. would be type I error this is a false positive
If a study has the power of 85% to detect a difference, what is the probability of type II error?
A. 85%
B. 0.85%
C. 15%
D. Unable to calculate
C. 15%
power = 1 beta
What assumptions did the researchers make when doing their power calculation?
The study was designed to have a power of 89% to detect a 15% reduction in the rate of primary endpoint (death/MI/stroke) for patients in the intensive-therapy group, as compared to the standard-therapy group, assuming a rate of 2.9% per year in the standard-therapy group and a planned follow-up of 5.6 years.
A. 10,000 patients would be enrolled
B. Rate of death of 2.9%/yr in standard-therapy group
C. Patients would be followed for 5.6 years
D. Both 2 and 3
D. Both 2 and 3
pay attention to the last question :)
A. is not the answer because they said nothing about the amount of people enrolled
power of 89% what is the
What can you conclude about the results for the outcomes of death, MI, and stroke? (α = 0.05)
A. Rate of death was significantly greater in intensive-therapy group
B. Rate of MI was significantly greater in the intensive-therapy group
C. Rate of stroke was significantly greater in the intensive-therapy group
D. Both 1 and 3
the standard therapy
rate of death was higher in intensive than standard but the eoppotise is true for MI
answer: A. The rate of death was significantly greater in intensive-therapy group
If everything else remains constant, what will happen as the sample size increases?
A. The power of the study increases
B. The study’s alpha increases
C. The difference in the primary endpoint between the two groups increases
D. The standard deviation increases
A. The power of the study increases
“We calculated a sample size (n = 200) sufficient to detect a 20% difference between the two groups’ cure rates with 80% power and α=0.05.” The study enrolled 189 patients and detected an 18% difference between the two groups’ cure rates. Which one of the following is true?
A. If p < 0.05, the results ARE statistically significant
B. If p < 0.05, the results are NOT statistically significant because the study was underpowered