BIOSTATS concepts Flashcards
The type of data has a logical order with VALUES that continuously increase (or decrease) by the SAME amount
example: heart rate of 120BPM is twice as fast as a HR of 60BPM
continuous data
What are the two types of continuous data?
interval data and ratio data
What is the difference between interval and ratio data?
interval data has NO meaningful zero
example: celsius temperature, it has no meaningful zero (0 does not mean no temperature)
What is an example of ratio data?
meaningful zero –
HR of 0BPM is cardiac arrest, zero equals none
What are the two types of categorical/discrete data?
nominal and ordinal
*these are categories!
in this type of data, subjects are sorted into arbitrary categories (names) such as male and female. “yes or np” data
nominal
name=nominal
This type of data comes from the word order - this data is ranked and has a logical order
example: pain scale (2 does not mean twice less than score 4)
these categories do NOT increase by the same amount
ordinal dat
Data is provided by some type of measurement which has unlimited options (theoretically) of continuous values
continuous data
Data fits into a limited number of categories
discrete/categorical data
Examples: age, height, weight, time, blood pressure
ratio data
(continous data)
ordered, equal
Example: temperature scales
interval data (continuous data)
ordered, equal
Example: gender, ethnicity, martial status, mortality
nominal data
no set order
Example: NYHA functional Class I-IV, pain scale 0-10
ordinal data
ordered, ranked
the average value
mean
what type of data is mean more preferred for?
continuous data that is normally distributed
the value in the middle when the values are arranged from lowest to highest
median
preferred for ordinal data or continuous data that is SKEWED
The value that occurs most frequently
mode
what measure of central tendencyy is preferred for nominal data?
mode
what measure of central tendencyy is preferred for ordinal data or continuous data that is SKEWED?
median
the difference between the highest and lowest values
range
indicates how spread out the data is
standard deviation
Large sample sets of what type of data forms a bell curve?
continuous
what does the distribution of data that is normal/bell shape look like?
symmetrical (even on both sides) with most of the values closer to the middle
half of the values are on the left side of the curve
half of the values are on the right side
with small number of values on the tails
When data is normally distributed what does the mean median and mode look like?
the same!
68% of the values fall within 1 SD of the mean and 95% of the values fall within 2SDs of the mean
When the data narrows what happens to the curve?
the curve gets taller and skinnier
When does skewed data normally happen?
sample size is small and there are outliers
right skew - more low values
left skew - more high values
an outlier has a large impact on the (median mean mode)?
mean
in this case, median is a BETTER measure of central tendency
The distortion of central tendency caused by outliers is decreased by..
collecting more values
In a study, does a researcher want to accept the null or the alternate hypothesis?
the alternate!
null=no statistically significant difference
___ is the threshold for rejecting the null hypothesis
alpha
a maximum permissible error margin (commonly set at 5% or 0.05)
Where does alpha correlate with when the data has normal distribution?
the values in the tails
what value is compared to alpha?
the p-value
if the alpha is set at 0.05 and the p value is less than alpha (P<0.05) the null hypothesis is rejected
_______ provides the same information about significance as the p-value, plus the precision of the result
confidence interval
alpha and the CI in a study will correlate with each other
if alpha is 0.05, the study reports __% CIs
95%
alpha = 0.01, CI=99%
Comparing difference data, when is it significant?
when the CI doesn’t cross ZERO
comparing ratio data (relative risk, odds ratio, hazard ratio) is significant when..
the CI doesn’t cross ONE
Relative risk crosses one. is it significant?
no!
odds ratio crosses zero, is it significant?
yes! it doesn’t cross one@
hazard ratio crosses one is it significant?
no!
What does a narrow CI range imply?
HIGH precision
wide CI range = poor precision
The CI indicate that you are 95% confident that the true value of the ARR for the general population lies somewhere in the RANGE (0.95 CI 0.06-0.35)
What is a type-I error?
false positive
the alternate hypothesis was accepted and the null hypothesis was rejected in ERROR
the probability or risk of making a type I error is determined by
alpha
and it relates to the confidence interval
CI= 1-alpha
When alpha is 0.05 and a study is reported with a P<0.05, what is the probability of a type I error occurring?
<5%
1-0.05 = 0.95 = 95% CI
What is a type II error?
false negative! this one sucks!!!!!!!!
when the null hypothesis is ACCEPTED when it should have been REJECTED :(
___ is set by the investigators during the design of a study, and it is typically set at 0.1 or 0.2, meaning the risk of a type II error is 10% or 20%
BETA!
beta related to POWER
The risk of a type II error increases when…
the sample size is too small
to decrease this risk, a power analysis is performed to determine the sample size needed to detect a true difference between groups
____ is the probability that a test will REJECT the null hypothesis correctly / to avoid a type II error
power
power = 1-B
As the power increases, the chance of a type II error inc or dec?
decreases
____ is the ratio of risk in the exposed group (treatment) divided by risk in the control group
relative risk (RR)
RR= risk in tx group/ risk in control group
RR = 1 (or 100%) implies
no difference in risk of the outcome between groups
RR >1 implies
greater risk of the outcome in the treatment group
more risk
RR <1 implies
lower risk of the outcome in the treatment group
less risk
lets say RR=0.57, what does this mean?
patients treated with the treatment were 57% AS LIKELY to have progression of disease as placebo-treated patients
57% reduced risk
RR=AS LIKELY
lets say RR = 1.5 what does this mean?
indicates that there is 50% greater than/increased risk in the treatment group
What is calculated after the RR to indicate how much the risk is reduced in the treatment group?
relative risk reduction
how do you calculate RRR?
(%risk in control group - %risk in treatment group)/ (%risk in control group)
OR
1-RR
lets say the RRR is 43% what does this mean?
treatment group are 43% LESS LIKELY to have HF progression than placebo treated
RRR=LESS LIKELY
_____ is more useful than RR and RRR because it includes the reduction in risk AND the incidence rate of the outcome
absolute risk reduction
ARR calculation
(% risk in control group - % risk in treatment group) = ARR
lets say the ARR is 12% what does this mean?
12 out of 100 patients benefit from the treatment
____ is the number of patient who need to be treated for a certain period of time in order for ONE patient to benefit
NNT
*rounded UP
NNT =
1/ARR
Lets say NNT =9, what does this mean?
for every 9 patients who receive treatment for one year, HF progression is prevented in one patient
____ is the number of patients who need to be treated for a certain period of time in order for ONE patient to experience harm
NNH
*rounded DOWN
NNH=90 what does this mean?
one additional case of major bleeding is expected to occur for every 90 patients taking clopidogrel instead of placebo
Which studies are not suitable for relative risk calculations?
case control studies – odds ratio!
In order to estimate the risks associated with a treatment or some type of intervention in a CASE CONTROL study, ___ is caclulated instead
odds ratio
Odds ratio can be used in what studies
most commonly case control
but also cohort and cross sectional
OR =
AD/BC
A- # that have the outcome, with exposure
B- # without the outcome, with exposure
C- # that have the outcome, without exposure
D- # without the outcome, without the exposure
*prob set up the chart pg 220
OR= 1.23 what does this mean?
treatment is associated with a 23% INCREASED risk of falls with fractures
When do you use hazard ratio instead?
in a survival analysis, analysis of death or disease progression
rate at which an unfavorable event occurs within a SHORT period of time
HR =
HR in treatment group/ HR in control group
*use primary endpoint
HR =1 what does this mean
there is no benefit to CVD risk when adding treatment to therapy
OR or HR = 1
the event rate is the same, no advantage of treatment
OR or HR >1
the event rate in the treatment group is HIGHER
HR of 2 = for an outcome of death, indicates that there are twice as many deaths in the treatment group
OR or HR <1
the event rate in the treatment group is LOWER
______ combines multiple individual endpoints into one measurement
composite endpoint
For continuous data that is normally distributed _____ methods are appropriate for the statistical test
parametric
if not normally distributed =nonparamteric
This is a parametric method used when the endpoint has continuous data and the data is normally distributed.
T-test
When the data from a single sample group is compared with known data from the general population what test is performed
ONE SAMPLE T-test
if a single sample group is used for a pre- or post- measurement (ex: the patient serves as their own control) what test is appropriated
paired t-test
this test is used when the study has TWO independent samples
the treatment and the control groups
for example: a study comparing the reduction in A1C values between metformin and placebo would use this
student t-test
this test is used to test for statistical signficance when using continuous data with THREE or more samples or groups
ANOVA/ f test
Types of statistical tests for continuous data
t tests
ANOVA
What tests are used for discrete (categorical) data aka nominal or ordinal data?
chi-square test or fisher’s exact
ex: if a study assesses the difference between two groups in mortality (nominal data) or pain scores based on pain scale (ordinal data) –> chi square test
One group numerical/continuous data, parametric test
one-sample T test
one group has before and after measures, parametric test, numerical/continuous data
dependent/paired t -test
two groups (treatment and control groups) for numerical continuous data, parametric test
independent/unpaired student t test
measurements of dose and time are both _____ data
continuous
________ is a statistical technique that is used to determine if one variable (such as days hospitalized) changes, or is related to another variable (such as incidence of hospital acquired infection)
correlation
correlation does/does not prove a causal relationship
does not
________ is used to describe the relationship between a dependent variable and one or more independent variables or how much the value of the dependent variable changes when the independent variable changes
regression
linear = continuous data logistic = categorical data cox = categorical in a survival analysis
Sensitivity is the true
positive
describe how effectively a test identifies patients WITH the condition
specificity is the true
negative
city = negative
describe how effectively a test identifies patients WITHOUT the condition
Sensitivity =
A/A+C x 100
of people true positive/#total people that tested positive
specificity =
D/B+D x 100 #of people true negative/#total people that tested negative
sensitivity of 28% means
only 28% of patients with the condition will have a positive result
the test is negative in 72% of people with the disease (missed diagnosis)
specificity of 87% means
the test is negative in 87% of patients without the disease, but 13% of patients without the disease can test positive (incorrect diagnosis)
________ analysis includes data for ALL patients originally allocated to each treatment group (active and control) even IF the patient did NOT complete the trial according to study protocol
Intention to treat
______ analysis is conducted for the subset of the population that did complete according to protocol
per protocol
These types of trials attempt to demonstrate that the new treatment has roughly the SAME effect as the old treatment
equivalence
these trials attempt to demonstrate that the new treatment is no worse than the current standard based on the delta margin
non-inferiority
delta = the minimal difference in effect between the two groups that is considered clinically acceptable based on previous research