Stats Flashcards
Sample
a subset or portion of the full population (representatives)
When is a sample useful
When studying the complete population is not feasible
What is commonly utilized to draw samples
random processes
study measurments
- measurements are collected on desired variables
- comparisons are made (statistical analyses)
- Inferences will be made about the sample-derived measurements and their comparisons
(inferences will also be made to the full population of similar subjects (generalizability))
Null hypothesis
perspective which states there will be no (true) difference between groups
-most conservative and commonly utilized
- Various statistical-perspectives can be taken by the researcher
- superiority
- Noninferiority
- Equivalency
Alternative hypothesis
Perspective which states there will be a (true) difference between the groups being compared
what are the 2 key attributes of data measurements (variables)
- Magnitude (or Dimensionality)
- Consistency of scale (or Fixed interval)
- equal, measurable spacing between units
Study population
the final group of individuals selected for a study
What are the 3 key levels and attributes of measurements
- Nominal
- Ordinal
- Interval/Ratio
Explain Nominal data
- Dichotomous/Binary; non-ranked Named categories
- No magnitude/No consistency of scale / No Rational Zero
- Nominal variables are simply labeled variables without quantitative characteristics
_______ variables are simply labeled variables without quantitative characteristics
Nominal
Explain Ordinal Data
- Ranked categories; non-equal distance
- Yes Magnitude/no consistency of scale/ no Rational zero
Interval/Ratio Data
- order and magnitude and equal intervals-of-scale (units)
- Yes magnitude/ Yes consistency of scale/ no or yes rational zero (no-interval; yes-ratio)
Ex. - Number of living sibling and personal age (in years)
after data is collected, we can appropriately go _________ specificity/detail of data measurements (levels), but never go ____
down, up
What are the two discrete data types
nominal and ordinal
What kind of data is continuous
interval
Mean/Median/mode are only useful for _____ data
continuous
Difference in each individual measurement value and the groups’ mean
Variance
Square root of variance value (restores units of mean)
Standard Deviation
Stat tests useful for normally-distributed data are called ________ tests
Parametric
Asymmetrical distribution with one “tail” longer than another
Positively skewed
A distribution is skewed anytime the _____ differs form the _____
median differs from the mean
When mean is _____ than median it is _____ skew
positive skew
Asymmetrical distribution with one “tail” longer than another
Negatively skewed
Distribution is skewed anytime the _____ differs from the _____
median differs from the mean
When mean is _____ than the median it is a negative skew
lower
A measure of the asymmetry of a distribution
Skewness
The perfectly-normal distribution is symmetric and has a skewness value of
0
A measurement of the extent to which observations cluster around the mean. For a normal distribution, the value of the kurtosis statistic is 0
Kurtosis
Postive kurtosis means ______ clustered
More
Negative kurtosis means _____ clustered
less
What are the required assumptions of interval data (for proper selection of a parametric test)
- Normally-distributed
- Equal variances
- Randomly-derived and independent
Test used to see if there is equal variances between groups
Lavene’s test
How to handle interval data that is not normally distributed
- Use of a statistical test that does not require the data to be normally-distributed (non-parametric tests)
- Or transform data to a standardized value (z-score or log)
- hoping that the transformation allows data to be normally-distributed
The ability of a study design, its methodology, and the selected test statistic to detect a true difference if one truly exists between group-comparisions (analogous to sensitivity in screening)
Power (1-Beta)
The larger the ______, the greater the likelihood (ability) of detecting a difference if one truly exists (increases power)
Sample size
sample size determination
- Minimum difference groups deemed significant
- The smaller the difference between groups necessary to be considered “significant” (important), the greater number needed (“N”)
- Expected variation of measurement (known or estimated)
- Alpha (Type 1) and Beta (Type 2) Error Rates (power)
- add in anticipated drop-outs or loss to follow-ups
P value
probability of observing, due to chance alone
The P value is selected by investigators before
the study starts (a priori)
Customarily the pre-selected p value is _____. Meaning what?
- customarily 5% (0.05)
- The risk of experiencing a Type 1 error is acceptably low (less than 5%)
Type I error (alpha error)
- False positive
- Rejecting the null hypothesis when it is actually true, and you should have accepted it
Type II error (beta error)
- False negative
- Not rejecting the null hypothesis when it is actually false, and you should have rejected it
What are 5 different ways to interpret a pre-set p value
- the probability of making a type 1 error if the null hypothesis is rejected
- The probability of erroneously claiming a difference between groups when one does not really exist
- The probability of the outcome of the group’s differences occurring by chance
- The probability of obtaining group differences as great or greater if the groups where actually the same/equal
- The probability of obtaining a test statistic as high/higher if the groups were actually the same/equal
What test do you want to have a high p value and why
Levenne’s test
- because you don’t want there to be a statistical significance among the test groups
What are the most common selections for confidence intervals
90%, 95%, and 99%
The confidence intervals high and low value are calculated at an a priori percentage of confidence that statistically the real (yet unknown) difference or relationship _____
resides
Confidence intervals are based on
-Variation in sample (V/SD) and Sample size (N)
If CI crosses 1.0 (for ratio (OR/RR/HR) or 0.0 (for other comparisons (ex. interval variables) than the data is
not significant
when reviewing a study it is important to ask
Does “statistical” significance confer meaningful “clinical” significance
What are the 4 key questions in selecting the correct statistical test
- ) What type of Data is being collected/evaluated
- ) What type of comparison/assessment is desired
- ) How many groups are being compared
- ) is the data independent or related (paired)
When asking yourself what typed of dat is bing collected/evaluated you further ask
- does the data have magnitude?
- Does the data have fixed, measurable interval along the entire scale
_____ provides a quantitative measure of the strength and direction of a relationship between variables
Correlation (r)
Value range for correlation (r)
from -1.0 to +1.0
A correlation that controls for confounding variables is a
partial correlation
What is the nominal correlation test
Contingency coefficient
What is the ordinal correlation test
Spearman correlation
What is the interval correlation test
Pearson correlation
All correlation tests can be run as a ______ to control for confounding
partial correlaiton
p>0.05 for a pearson correlation just means there is no _____ correlation; there may still be a _____ correlation present
linear, non-linear correlation present
What is the nominal proportion of events (survival) test
Log-Rank
Type of test that compares the proportion of, or time-to, event occurrences between groups
Survival tests (proportion of evens)
survival tests are commonly represented by a
Kaplan-meier curve
What is the name of ordinal survival test
Cox-Proportional Hazards test
What is the name of the interval survival test
Kaplan-Meier test
Type of test that provides a measure of the relationship between variables by allowing the prediction about the dependent, or outcome, variable (DV) knowing the value/category of independent variables (IV’s)
Regressions
Regression test are able to calculate ____ for a measure of Association
OR
Nominal regression test
Logistic regression
Ordinal regression test
Multinomial logistic regression
Interval regression test
linear regression
If the type of comparison/assessment is frequencies/counts/proportions then you
must ask questions 3 and 4
- ) how many groups are being compared
- ) is the data independent or related (paired)
What are some buzz words for paired data
- pre vs. post
- before vs. after
- beginning vs. end
- Baseline vs. end
what is the test for interval date greater than or equal to 3 groups with independent data
ANOVA or MANOVA
What is the type of test used for 2 groups of Nominal independent data
(Pearson’s) Chi-square test (x^2)
What is the type of test used for greater than or equal to 3 groups of independent nominal data
chi-squar test of independence (X^2) or Fisher’s exact test
What is the type of test used for greater than or equal to 2 groups of nominal data with cell count
Fisher’s exact test
What are the are the assumption of chi-squared tests
- usual chi-square (binomial) distribution for nominal data
- no cell with expected count
For statistically significant finding in 3 or more comparisons study of nominal independent data one must perform_________ to determine which groups are different: multiple chi-squared test are never acceptable, why? and what test is normally used
- subsequent analysis (post-hoc testing)
- Multiple chi-squared tests are never acceptable because the risk of a type 1 error increases with each additional test (almost guaranteed after 4-5 tests)
- Bonferroni test of Inequality (Bonferroni correction)
- adjusts the p value for # of comparisons being made
- very conservative
2 groups of paired/related nominal data use what test
McNemar test
greater than or equal to 3 groups of paired/related nominal data use what test
Cochran (note that if statistically significant than bonferroni test of inequality is used)
test for 2 groups of independent ordinal data
Mann-Whitney test
Test for greater than or equal to 3 groups of ordinal data
kruskal-wallis test
Mann-whitney and kruskal-wallis test both compare the _____ values between groups
median
test for 2 groups of paired/related ordinal data
Wilcoxon signed rank test
Test for greater than or equal to 3 groups of paired/related ordinal data
Friedman Test
Both the wilcoxon signed rank test and friedman test compare the ____ values between groups
median
What are the post-hoc tests for 3 or more groups of ordinal data? explain each
- Student-Newman-Keul test
- Compares all pairwise comparisons possible
- all groups must be equal size
- Dunnett test
- Compares all pairwise comparisons against a single control
- All groups must be equal in size
- Dunn test
- compares all pairwise comparisons possible
- Useful when all groups are not of equal size
which of the post-hoc tests for ordinal data does not require groups to be equal in size
Dunn test
test for 2 groups of independent interval data
Student t-test
Test for 3 or more groups of independent interval data
- Analysis of Variance (ANOVA) (note an ANOVA can truly handle 2 or more groups)
- Multiple Analysis of Variance (MANOVA)
Both ANOVA and student t-test compare the ____ of all groups (along with infra- and inter- group variations) against a single Dependent variable
means
MANOVA compares the means of all groups against ______
multiple Dependent variables
Test for greater than or equal to 3 groups of independent interval data with cofounders
- Analysis of Co-Varience (ANCOVA)
- Multiple analysis of Co-Variance (MANCOVA)
an ANCOVA test compares the means of all groups (along with intra and inter-group variations) against a single DV while also
controlling for the co-variance of confounders
An MANCOVA test compares the means of all groups (along with intra and inter-group variations) against multiple DVs while also
controlling for the co-variance of confounders
Test for 2 groups of paired/related interval data
Paired t-test
When you see mean think what kind of data
interval
test(s) for greater than or equal to 3 groups of paired/related interval data
- repeated measures ANOVA
- Repeated measures MANOVA
Test for greater than or equal to 3 groups of Paired/related interval data with cofounders
- repeated measures ANCOVA
- Repeated measures MANCOVA
What are the post-hoc tests for 3 or more groups of interval data? explain them
- Student-Newman-Keul test
- Compares all pairwise comparisons possible
- all groups must be equal
- Dunnett test
- Compares pairwise comparisons against a single control
- all groups must be equal in size
- Dunn test
- Compares all pairwise comparisons possible
- Useful when groups are not of equal size
- Tukey and scheme tests
- compares all pairwise comparisons possible
- all groups must be of equal size
- Tukey test- slightly more conservative than the student-newman-keul test
- Scheffe test- Less affected by violations in normality and homogeneity of variances-most conservative
- Bonferroni correction
- adjusts the p value for # of comparisons being made
- very conservative
- adjusts the p value for # of comparisons being made
if you see predict then think what kind of test
Regression
agreement between evaluators (consistency of “decisions”, “ determinations” )
kappa statistic
Kappa interpretations
+1= the observers perfectly “classify” everyone exactly the same way
0=there is no relationship at all between the observer’s classifications”, above the agreement that would be exited by chance
- 1= the observers “classify” everyone exactly the opposite of each other
+1= good agreement
-1= poor agreement