Intro to Biostats Flashcards
a research perspective which states there will be no difference between the comparison groups
null hypothesis H0
statistical perspectives that can be taken by the researcher in their alternative hypothesis
superior
noninferior
equal
alpha error
type I error
rejecting the null when you should accept it
false positive
beta error
type II error
accepting null when you should reject it
false negative
define power
statistical ability of a study to detect a true difference when it exists
“accuracy”
the ______ the sample size, the greater the ability of _________ .
greater
ability to detect a true statistical difference
increase in power
the smaller the difference between group, that is required to show a statistical difference, then the greater _________ is needed.
greater sample size is needed
when determining sample size you should anticipate …….?
drop outs and lost to follow up
so oversample in the beginning to compensate
bell curve percentages based on standard deviation
1 stdD = 68%
2 stdD = 95%
3 = 99.7%
probability value = ?
p value
the probability value is selected before or after the study starts?
before
if the p value is lower than the alpha value, then we say?
alpha value = 5%, 1%, etc.
we say it is statistically significant
relate p value to a statistically significant test
the p value is lower than alpha
so we reject the null (not accept)
relate a p value less than the alpha of 5%, and the risk of type I error
p value is lower, we reject the null
therefore, the risk of experiencing a type I error is acceptably low = less than 5%
at 95% confidence and p value of 0.005, what is the risk of error?
.5% risk of being wrong
relate a p value of 0.01% and 3 groups
there is at least one significant difference between the 3 groups
typically between control and the most extreme group
should baseline data be statistically significant or not different?
should show no statistical difference
to show that our experiment groups are not different so final results will show a difference only if my intervention caused it
a p value of 0.91, what is your chance of being wrong?
91% chance of being wrong when you say there is a statistical difference (type I error)
if you claim a difference you have a 91% chance of being wrong
when do we want p values to not be statistically different
- when comparing baseline characteristics at start
2. When using a levene’s test
3 primary level for variables - data types
nominal
ordinal
interval/ratio
3 key attributes of data measurement
order/magnitude
consistency of scale (equal distance)
rational absolute zero
nominal
no order
no consistency of scale
simply work w/ no quantitative characteristics
any question that only has 2 categories is always what type of data?
nominal
ordinal
has order
no consistency of scale
ex. pain scale, stress levels, happiness ratings
disagree, somewhat disagree, neutral, etc.
interval/ratio data
has order
has consistency of scale
ratio has absolute zero
interval data
arbitrary zero value
0 does not mean absence
temperature
ratio data
has an absolute zero
0 = absence
ex. 0 heartbeat = dead
after data is collected, we can appropriately go _____ in specificity/detail of data measurement levels, but never ____ .
go down
but never up
in terms of nominal, ordinal, interval, ratio
measures of dispersion/spread
mean, median, mode
outliers
min/max and range
IQR
difference between variance and standard deviation
variance is the distance from the mean of one particular value
standard deviation represents a % of data being this far from the mean
relate bell graph to percentiles
broken into 4 25% sections about the median (=50th percentile)
IQR
interquartile range
Q1 - Q3 = IQR
25th - 75th percentile = IQR
statistical tests used on normally distributed data is called ?
parametric tests
positively skewed
tail pointing to the right/positive direction
mean > median
negatively skewed
tail pointing to the left/negative direction
mean < median
if the data is not skewed then how are the mean and median related?
they should be the same/ almost the same
what are the 3 ways to tell if the data is skewed?
- are the mean and median the same?
- what does the graph look like?
- what is the skew value?
skewness value
if data is not skewed it will be as close to zero as possible
can have pos./neg. values
kurtosis
a measure of extent to which the data clusters about the mean
normal distribution, kurtosis = 0
positive kurtosis
= higher clustering about the mean
negative kurtosis
= less clustering about the mean
discrete vs. continuous data
discrete is solid numbers whilst continuous can have decimals
required assumptions for interval/ratio data
- normally distributed
- equal variances
- randomly derived and independent
levene’s test
tells us if interval/ratio data is normally distributed w/ equal variances or not
what if interval data is not normally distributed?
just use a non-parametric test
or transform the data using z-scores (log transformations)
variables required when interpreting a p value
- is it significant
- who was higher/lower
- by how much?
include all three, no specific order
______ must be equal in order to pick an interval test.
variances
levene’s test is used to assess whether ______ are equal between ?
variances between all groups
before running a levene’s test you need? and why
need the null hypothesis stating there is no difference
we want the p value to come back not significant to prove the variances are equal
if we prove they’re equal data can then be treated as interval data
number of siblings is an example of _____ data
interval data
define confidence interval
an interval around the p value that we are %% confident that the true difference is within this range
a CI that includes reducing and increased risk
means that it is not significant because a significant test cannot show both directions
when interpreting a CI for OR/RR, and the range contains 1.0
is not significant
if range crosses 1.0 then it is not significant
when interpreting CI for actual data values, and crosses zero
it is not significant
does statistical significance actually confer meaningful, _____ significance?
clinical significance
4 key questions to selecting the correct statistical test
- what data level is being recorded?
- what type of comparison/assessment is desired?
- how many groups being compared?
- is the data independent or related/paired?
correlation tests
provides quantitative measure of strength & direction of a relationship between variables
ranges from -1 to +1
refers to line graphs/ slope
nominal correlation test
contingency coefficient
ordinal correlation test
spearman correlation
interval correlation test
pearson correlation
running a partial correlation
to control for confounding
contingency coefficient
for nominal correlation testing
spearman correlation
for ordinal correlation testing
pearson correlation
for interval correlation testing
what do correlation tests tell you?
tell you the relationship between two variables
gender is _____ data
nominal
a test to determine event-occurrence or time to event
survival test
what does survival describe
the lack of the “event” occurring
survival test
compares the proportion of events over time, or time to events
between groups
nominal survival test
log-rank test
ordinal survival test
cox-proportional hazards test
interval survival test
Kaplan-meier test
log-rank test
nominal survival test
cox-proportional hazards test
ordinal survival test
Kaplan-meier test
interval survival test
Kaplan-meier curve
a graphical representation of a survival test
all tests can provide this
(even tho interval survival test has this same name)
see changes over time
survival test
testing for outcome predictions or associations
regression testing
regression
provides a measure of relationship between variables and allows the prediction about the dependent/outcome, when the independent is known
can use several variables to increase prediction
regression tests also calculate ?
OR for a measure of association
nominal regression test
logistic regression
predict whether you do or do not get something
nominal regression
only two options
if the outcome variable (dependent variable) is ordinal data type
then choose ordinal regression test
ordinal regression test
multinominal logistic regression
if the outcome variable is of the interval data type
then use interval regression test
ex. predicting actual gpa number
interval regression test
linear regression
logistic regression
nominal regression test
multinominal logistic regression
ordinal regression test
linear regression
interval regression test
want to predict the likelihood of some outcome
regression testing
what do you evaluate to determine what type of regression test to run?
only what data type the outcome variable is
univariate
unadjusted OR
ordinal data - 2 groups of independent data
mann-whitney test
ordinal data - >3 groups of independent data
Kruskal-wallis test
ordinal independent data
- -both tests compare the median values between groups
- -also used for non parametric interval data
- -if 3+ groups are significant must do a post-hoc test to determine differences
3+ group comparison that is significant - ordinal independent data
then you must do a post-hoc test to determine what the differences are
ordinal data - 2 groups of paired/related data
Wilcoxon signed rank test
ordinal data - 3+ groups of paired/related data
friedman test
ordinal paired/related data
- -both tests compare median values
- -can be used for non parametric interval tests
- -if significant in 3+ tests must do post-hoc test
key words for paired/related data
pre vs post
before vs after
baseline vs end
ordinal data - post-hoc tests for 3 or more group comparisons
student-newman-keul test
Dunnett test
dunn test
student-newman-keul test
compares all pairwise comparisons possible
all groups must be equal in size
post-hoc test for ordinal data
Dunnett test
compares all pairwise comparisons against a single control
all groups must be equal in size
post-hoc test for ordinal data
the other two tests find all comparisons possible - this is everything vs one specific thing
dunn test
compares all pairwise comparisons possible
useful when all groups are not of equal size
post-hoc test for ordinal data
ordinal post-hoc test: when groups are not equal in size
dunn test
ordinal post-hoc test: to find all comparisons possible
student-newman-keul test
dunn test
interval data: 2 groups of independent data
student t-test
interval data: 3+ groups of independent data
analysis of variance - ANOVA
interval data - independent data
tests compare means of all groups against dependent variable
–must do post-hoc test when 3+ group comparison is significant
interval data: 3+ groups of independent data w/ confounders
analysis of Co-variance - ANCOVA
ANCOVA
compares the means of all groups against a dependent variable while also controlling for the co-variance of confounders
for interval independent data w/ confounders
how many groups can an ANOVA analyze?
any number of groups
interval data: 2 groups of paired/related data
paired t-test
compares the mean values between groups that are related
interval data: 3+ groups of paired/related data
repeated measures of ANOVA
compares the means of all groups of related data against a dependent variable
must do post-hoc if significance is found
interval data: post-hoc tests for 3+ group comparisons
student-newman-keul test
Dunnett test
dunn test
same explanations as ordinal data
plus
tukey/scheffe tests
Bonferroni correction
tukey/scheffe test
interval post-hoc test
compares all pairwise comparisons possible
groups must be equal in size
tukey vs. scheffe tests
tukey – more conservative than the stu.n.k
scheffe – less affected by violations in normality and homogeneity of variance
*most conservative
Bonferroni correction
adjusts the p value for # of comparisons being made
very conservative
interval data post-hoc test
validation/assessment committee
kappa statistic
kappa interpretation
kappa statistic
a correlation test
shows relationships between evaluators
kappa interpretation
\+1 = the observers perfectly classify everyone the same way 0 = no relationship between observers classifications above what is expected by chance -1 = observers classify everyone exactly the opposite of each other
kappa K
value can be + or -
meaning good agreement or poor agreement
kappa test significance
to determine if the decisions of the observers is consistent amongst multiple observers
are their classifications good?
2 groups of independent nominal data
pearson’s Chi-square test
3 or more groups of independent nominal data
chi-square test of independence
nominal data: 2 or more groups w/ expected cell count of <5
fisher’s exact test
nominal data - 3 or more groups of independent data – what to do after getting a significant result
must run post-hoc to determine which groups are different
multiple chi square tests is not acceptable
Bonferroni test of inequality
2 groups of related nominal data
McNemar test
3 or more groups of related nominal data
Cochran
same as chi squared but mathematically factors in concept of paired data
then Bonferroni test of inequality for post-hoc testing
key words for paired/related data
pre vs post
before vs after
baseline vs end
‘prediction’ is a key word for what type of assessment desired?
regression testing
‘event-occurrence’ or ‘time to event’ are key phrases for what type of desired assessment?
survival testing