Class 29-37: Intro To Biostats Flashcards
How do researchers choose to accept or reject the null hypothesis?
Statistical analysis
Based on p and confidence intervals.
3 primary levels for variables based on answers:
Nominal
Ordinal
Interval
2 key attributes of data measurement:
Magnitude
Consistency of scale.
Each attribute can be assessed w/ a “yes” or “no” response to the inquiry of:
“Does it have it?”
No magnitude and no consistency
Dichotomous/binary and NON ranked
Simply labeled variables w/o quantitative characteristics (dichotomous)
Nominal
Has magnitude but no consistency of scale
Ranked categories
Pain scales, severity of disease
Ordinal
Has magnitude and consistency of scale
Anything drawn w/ exact values
Interval / ratio
Nominal and ordinal attributes are considered:
Interval attributes are considered:
Discrete; continuous
Which statistical tests are selected based on level of data being compared
ALL statistical tests
After data, is collected we can appropriate go _____ in specificity/detail of data measurement (levels), but we can never go _____
Down; up
Non-comparative. Simple description of various elements of the study’s data
Descriptive statistics.
Measures of central tendency and dispersion
Mode/ median/ mean
Outliers
Minimum / max /. Range
Interquartile range= Q3-Q1
Know how to calculate these b/c it will be free points on the test.
The average of the squared differences in each individual measurement value and the groups mean
Variance- from the mean
The larger the number= the more variable
Square root of variance value (restores units of mean)
Standard deviation (SD)
Normally distributed= (one word)
Symmetrical
What graph is found when mean / median / mode are equal or near-equal:
Normally distributed
Interval data must be:
Normally distributed.
Stats tests useful or _____________________ data are called parametric tests
Normally-distributed
SD in normal distribution:
68%-top of bell curve
95%= +2 SD
99.7%= +3 SD
Parametric tests
Fixed mean / median / mode
Positively skewed graph
Asymmetrical distribution w/ one tail longer than the other
A distribution is skewed anytime the median differs from the mean (when mean is higher than median.
Mean > median
Negatively skewed graph:
Asymmetrically distributed w. One tail loner than another to the left.
Distribution is skewed anytime the median differs from the mean (when mean is lower than median)
Mean < median
Mean = 15.15 Median= 3 Mode= 0
Mean > median
Positively skewed
Outliers do not change:
The mode
Range is:
subtraction btw min and max.
The difference btw the 2 values.
Mean 5.05
Median: 7.0
Mode: 7
Skewed?
Mean < median
Skewed to the left
Negatively skewed.
Mean: 44.31
Median: 47.0
Mode: 49
Skewed?
Mean < median
Negative skewed.
Shift to the left
What Age extremes represent 95% of data?
Mean + (2xSD)
Mean - (2xSD)
A a measure of the asymmetry of a distribution
Perfectly normal distribution is symmetric and has value of 0
Skewness
Want to be closer to 0
A measure of the extent to which obs’s cluster around the mean. For a normal distribution, the value of 0
+= more cluster
-=less cluster
Kurtosis
Want to be closer to 0
Positive kurtosis =
More clustered
What is the age range of middle 68%?
95%
99%
One SD
2 SD
3 SD
Add to mean 3 times or
subtract to mean 3 time ??
Which of mean, mode, median could adequately represent the measure of central tendency for the gender variable?
Mode values would describe the # of each genders ( the most frequent)
Which variable is the most descriptive variable of nominal data
Mode is the most descriptive variable of data
What data type can median/ mean/ and mode ALL be defendable
Interval
Median and mode defendable for data type?
Ordinal
Req’d assumptions of interval data for proper parametric test:
(2)
- Normally- distributed -look at mean,median,mode-look at graph, look at skewness values,
- Equal variances:
run statistical test: levenne’s test - Randomly-derived and independent
* if these are not good enough, then interval / parametric CANT be used
Common statistical test for parametric test/interval data:
Compares equal variances
Levene’s test
How to handle data that’s not normally distributed:
- could use ordinal family stat. Test. (Non-parametric tests)
- could throw out outliers.
- transform data to a standardized value. (Z score or log)- can create a normal distribution. [issue: how do you log BP?]
Segars-picks ordinal typically
*TQ
What should you always do for data interpretation?
Run descriptive statistics and graphs
Researchers will either accept or reject this perspective, based on:
Statistical analysis
Rejecting the null hypothesis when you shouldn’t have!
Type 1 error (alpha)
Dont reject the null when you should have
Type II error (beta)
The ability of a study design to detect a true difference if one exists btw group-comparisons….
The level of accuracy in corectly accepting/rejecting the null
Power (1-ß)
ß=beta=type 2 error
The larger the sample size, the greater the likelihood (ability) of detecting a difference if one truly exists.
Increase in power
Sample size.
Most researchers accept up to _____ type 2 beta error rate…so must researchers want 80% chance of finding differences
What about alpha error rate?
20% of the BETA rate!!
5% for type 1 (alpha) rate
The number one issue in power that ppl worry about?
Sample size!
3 common characteristics for sample size determination:
- Minimum difference btw groups deemed “significant”
- Expected variation of measurement.
- Alpha (type 1) and beta (type 2) error rates and confidence interval.
- add in anticipated drop-outs or loss to f.u
Statistical tests determine the possible error rate or chance in compared difference or relationship btw variables
P values
Or type 1 error rate
Or alpha error rate
How do p value and type 1 error rate differ?
They dont theyre the same thing
% chance of making type 1 error.
By convention, accept about 5% risk of being wrong
P value
If P value is < 5% (or whichever is preset value), can you reject the null?
Yes! This is statistically significant.
Authors that pick 1% preset p value are _____likely to reject the null
LESS
If the p value is set higher (~10%), it is ________ to find group differences
Easier
But they have a higher risk of being WRONG
Define probability value:
Probability of making type 1 error
Know how to explain the statistical significance of a p value in own words:
The P value is the probability or chance of making a type I error if rejecting the null hypothesis.
What 2 areas do you love to see high p values?
Table 1 characteristics
And
Levenne’s test —-this shows that the studies randomization process worked!
Most common selections are 90,95,99%
Calc at an a priori percentage of confidence that statistically the real difference or relationship resides.
Based on: variation in sample and sample size
Confidence interval (CI)
Journals are moving away from solely reporting p values; or showing them at all.
Now people are using CI
Confidence interval.
Single best guess at relationship or difference btw groups
The value of 0 represents no difference
Range represents SD
Point estimate
If CI crosses 1.0 or an absolute difference of 0.0:
No difference
Sameness
NOT SIGNIFICANT = p>0.05
Slide 63: bivalirudin study. Interpret MAjor bleeding CI’s:
Compared to unfractionated heparin, Bivalrudin 34% less likely to have major bleed than heparin,
and can be said w. 95% confidence that the real group difference is btw 51% - 10% reduced risk,
which is statistically significant.
Looking at forest plots, you dont want CI to cross:
1 on the x axis
Important consideration:
Does statistical significance actually confer meaningful, clinical significance?
4 key questions to selecting the correct statistical test:
- What data level is being recorded?**most important
- What type of comparison/assessment is desired?
- How many groups are being compared?
- Is the data independent or related (paired)?
- What data level is being recorded?
What data have magnitude? (Y/n)
Does data have a fixed interval? (Y/n)
-the remaining 3 questions get you around the other portions of each individual sheet
- What type of comparison/assessment is desired?
Correlation test
Provides a quantitative measure of the strength and direction of a relationship btw variables
Correlation
Values range from -1 to +1
A correlation that controls for confounding variables
Partial correlation
Are 2 things related and how are they related?
Correlation
Positive correlation:
+1.0
One goes up; the other does as well
Negative correlation
-1.0
One goes up; the other goes down
Zero correlation:
0
No correlation whatsoever
Perfect correlation:
45 degree angle up or down
Artificially fitted and slanted so that the difference btw line and each individual measurement is the smallest distance possible.
Like an average of all of the dots
Correlation line
Nominal correlation:
Contingency coefficient
Ordinal correlation:
Spearman correlation
Interval correlation:
Pearson correlation- “is there a linear relationship?”
P> 0.05 for a pearson correlation just means there is no linear correlation; there may still be non-linear correlations present!
All correlations can be run as ___________________ to control for confounding
Partial correlation
NOT only used to see if ppl survive.
Frequency of occurrence or change in category over time
Help us understand when time is important=changes over time.
Survival test
Compares the proportion of, or time-to, event occcurences btw groups
Visual image of changes over time.
Commonly represented by:
Kaplan-meier curve
Nominal survival test:
Log-rank test
Ordinal survival test:
Cox-proportional hazards test
Interval survival test:
Kaplan-meier test
*nominal, ordinal, and interval can all be represented by kaplan-meier curve
Outcome prediction/ association of the groups:
Allows us to generate an asso.
Or
To take a bunch of variables and try to predict an outcome.
Regression test
Nominal regression test:
Logistic regression
Ordinal regression test:
Multinominal logistic regression
Interval regression test:
Linear regression
You can obtain an ____ from a regression
OR odds ratio
Data that is derived independent of their group. Groups are treated and followed independently from other groups
Independent data
2 groups of independent data of nominal data:
Males vs. females
Pearson’s- chi squared test (x squared)
3 groups (or more) of independent data of nominal data:
Chi-square test of independence (x squared)
Greater than 2 groups w/ expected cell count. Of <5 event occurrences
Fischer’s exact test
Greater than or equal to 3 groups of independent data-nominal data, for statistically significant findings (P<0.05) in 3 or more comparisons, one must perform subsequent analysis:
Post-hoc testing: to determine which groups are different
POst hoc testing for nominal data:
Adjusts the p value for # of comparisons being made, very conservative.
Take # of comparisons to do and divide the p values
0.05/3 (groups) = 0.01667 is new p value
Bonferroni correction: test of inequality
2 groups of paired/related data nominal data:
Mcnemar test
> 3 groups of paired/related data:
Cochran
Paired data = related data
From the same person.
He will use before and after / pre-post-/ baseline end/ as the terminology..
McNemar test (2 groups related data)
Cochran (>3 groups of paired/related data)
2 groups of independent data ordinal:
Mann-whitney test
> 3 groups of independent data ordinal:
Kruskal-wallis test
2 groups of paired/related data ordinal:
Wilcoxon signed rank test
> 3 groups of paired/related data ordinal:
Friedman test
Post hoc tests for 3 or more group comparisons:
See where the various differences are in between the groups.
Not needed if initially NOT stat. Significant!
Student-newman-keul test
Dunnett test
Dunn test
Post hoc test compares all pairwise comparisons possible
All groups must be equal in size
Ordinal data
Student-newman-keul test
Post hoc test for 3 or more tests compares all pairwise comparisons against a single control
All groups must be equal in size
Ordinal data
Dunnett size
Useful post hoc test comparing groups unequally: ordinal
Dunn test
2 groups of independent data-interval data
Student t-test
> 3 groups of independent data-interval
ANOVA: analysis of variance-powerful enough to handle any number of groups
MANOVA: multiple analysis of variance
Must perform a ______ if statically significant and >3 groups
Post hoc
> 3 groups of independent data w/ confounders in interval data
ANCOVA: Analysis of co-variance
MANCOVA: multiple analysis of co-variance
2 groups of paired/related interval data
Paired t-test
Data that is related
> 3 groups of paired/related interval data
Repeated measures ANOVA
Repeated measures MANOVA
> 3 groups of paired/related data w/ confounders in interval data.
Repeated measures ANCOVA or MANCOVA
Can use student-newman-keul, dunnett, or dunn test for post hoc testing. What’re 2 others for interval data?
Tukey or scheffe test: compares all pairwise comparisons possible. All groups must be = in size..
–tukey: more conservative than Stu.N.K.
–scheffe: MOST conservative.
Bonferroni correction: adjusts p values for # of comparisons made
Agreement btw evaluators (consistency of decisions, determinations)
Looking for consistency of agreement
Kappa statistic
Difference in radiologist reading films
-1 to +1
+=good agreement
-=poor agreement
Review: 4 key questions to ask:
- What data level is being recorded?
- what type of comparison/assessment is desired?
- how many groups?
- Is the data independent or related (paired)?