Final Flashcards
MCD is a mathematical multiple of ___
SEM
______ often used to construct and evaluate scales/questionnaires
Internal consistency
Logistic =
Use of categorical variables
DV is categorical
Ex: success vs non-success
Which correlation coefficient?
1 ordinal and 1 ratio/interval
Spearman’s rho
Kappa interpretation
Basically same as ICC
Depends on weights used;
Exactly same as ICC when weights squared
< 0.4 poor-fair
- 4-0.6 moderate
- 6-0.8 substantial
- 8-1.0 excellent
1-way ANOVA
Parametric
3 or more independent groups
1 IV with 3 or more levels
Logistic regression
Trying to predict a dichotomous variable
Diagnosis (have vs doesn’t have condition)
Outcome of treatment (success vs non-success)
Assumptions of regression analysis
Linear relationship = approximation of “true” lone in population
For every X there is a normal distribution of Y (sample data include random samplings from these distributions in Y)
Homogeneity of variance
DV = continuous measure
Discrete (nominal/ordinal) reliability coefficients
Percent agreement
Kappa - better
Regression is a __ statistic
Linear relationship =
Parametric
Linear relationship = approximation of “true” line in population
For every X there is a normal distribution of Y (sample data includes random samplings)
Homogeneity of variance
Logistic regressions primary outcome ____
OR (odds ratio)
Null value is 1 (not 0)
Which correlation coefficient?
All nominal dichotomy
Phi coefficient
Linear =
Use of continuous variables
DV is continuous
Ex: does age predict BP
Which correlation coefficient?
1 nominal dichotomy, 1 ratio/interval
Point biserial
Interpretation of Relative Risk and Odds Ratio scores
RR or OR = 1
Null value
No association between exposure and disease
RR or OR > 1
Positive association
Exposure considered harmful
RR or OR < 1
Negative association
Exposure is protective
Kruskal–Wallis ANOVA
Dependent variable Ordinal
Can not assume normal distribution
3 or more independent groups
1 IV with 3 or more levels
Coefficient of determination
Square of correlation coefficient
Done bc more directly interpretable
“The % of variance in y that is explained - or accounted for- by x”
Reliability is tied to the concept of
Measurement error
ICC estimate based on ___ will always be substantially higher than estimate based on ____
Average measures always high than single measures
ANOVA of regression
Test hypothesis that predictive relationship occurred by chance
If b (slope) = 0, line is horizontal = no relationship
If p < than alpha, reject null and conclude predictive relationship is significant
Paired t-tests
Parametric
1 group
1 IV with 2 levels
ICC interpretation showing “good reliability”
ICC > 0.75
ICC Model 1
Each subject measured by different set of raters; randomly chosen
Rarely used in clinical research
A reliable measure can be expected to
Repeat the same score on 2 different occasions provided that the characteristic of interest does not change
Interpretation of correlation coefficients
- 00-0.25 = little to no relationship
- 26-0.50 = fair relationship
- 51-0.75 = moderate to good
- 76-1.00 = good to excellent
These values are NOT strict cutoff points. Depends on type of research.
Most predictors are ___ scale, but can also use ___. But not ____.
Most predictors are continuous scale
Can also be dichotomous or ordinal scale
But NOT multi category nominal (ie race)
ANOVA
Umber of IV and DV
IV : more than 1
DV: 1
Rxx (reliability coefficient) will be bigger when
True variance is larger
Nonparametric tests are ___% of parametric tests with regard to power
65-95% as powerful as parametric equivalent
Reliability coefficient (rxx) ranges ___ meaning
Range 0-1
0 = no reliability
1 = perfect reliability
Multiple linear regression
More than 1 predictor in the model
Y= a + b1X1 + b2X2 a = regression constant b1X1 = 1st regression coefficient x 1st predictor B2X2 = 2nd regression coefficient x 2nd predictor
Note- there can be more than 2
Hierarchical Linear Modeling (HLM)
Linear mixed modeling
For use when data is “nested” within groups
(Students nestled within classroom,
Patients nested within clinics)
Occasions nested within subjects
Treats each subject like a regression line
Analyzes “trajectory” of each subject in each group
Standardized Beta Weights
Helpful to know relative contribution of each predictor variable
Impossible to tell with raw regression coefficients (ie b1 May be in years, b2 lbs.)
Raw coefficients transformed into unitless beta weights
Accuracy of prediction
Correlations only applicable for ___ of scores. Correlations quantify strength of ____ only.
Pairs of scores
Linear relationships only - based on equation for a straight line.
MANOVA
Number of IV and DV
IV = more than 1
DV = more than 1
MANOVA is for analyzing >1 DV simultaneously
Nonparametric stats are based on…
Comparisons of ranks of scores
Comparisons of counts (yes/no) or “signs” of scores
Phi coefficient
Both variables dichotomous
Ex: gender and group
Worthless scatter plot
Does NOT work with non-dichotomous nominal
Similar to chi-square test (will give same p-value)
But phi gives strength of relationship
Both ____ and ___ give single indicators of reliability that capture strength of a relationship plus agreement in a single value
ICC and Kappa
Problem with correlation coefficient (Pearson’s r)
Assess relationship, not agreement
Only 2 raters or occasions can be compared
___ gives “unstandardized” estimate of reliability (ie untis of measurement)
SEM
Cronbach’s alpha represents correlation ____
Among items and correlation of each individual item with the total score
Spearman Rank (rho) correlation coefficient (rs)
Nonparametric analog of Pearson’s r
1 continuous, 1 ordinal variable OR 2 ordinal variables
Analysis of residuals to test assumptions
Plot residuals on ___-axis
Predicted values on ___-axis
Residuals on y-axis
Predicted values on x-axis
Looking for symmetry
The amount of change in a variable that must be achieved to reflect a true change/difference
MDC minimal detectable difference/change
Point biserial correlation (r pb)
1 variable dichotomous, 1 variable continuous
Does NOT work with non-dichotomous nominal (ie age and race)
Computationally same as Pearson’s r
Results same as t-test
Ex: gender vs height
CV is unit-less, so helpful comparing ____
Variability between 2 distributions on different scales
Logistic regression
DV=
Predictors =
DV = dichotomous
Predictors (IV) = continuous, ordinal or dichotomous
We use __ to predict ___ In linear regression
X (IV) to predict Y (DV)
3 types of stepwise procedures
Forward: start with no predictors, then add
Backward: start with all predictors, then remove
Stepwise: start with no predictors, then add but can also remove
___ is stability of repeated measures over time. Is basically the same as test-retest reliability
Response stability
Kappa can be used on __ data
Nominal and ordinal
Adjusted R^2
Chance corrected R^2
Adjusted down for having more predictor variables
Accuracy of prediction
The % of variance in y that is explained (or accounted for) by x
Coefficient of determination
Multicolinearity
When Xs in model are substantially correlated with each other
Creates problems with interpretations of b weights
Select independent predictors: not highly correlated w/ each other but highly correlated w/ dependent (predicted) value
Non-parametric:
IV Level of measurement
DV level of measurement
Question
IV: nominal
DV: ordinal
Q: ranks different?
Regression line of best fit
Error from line = residual
Residuals are squared to eliminate sign and penalize for worse errors
Line with least squared errors = line of best fit
_____ uses relationships (correlation) as a basis for prediction
Regression analysis
Cautions with interpretations of correlation
Agreement
Causation
Extreme outliers (can create inflated correlation with only a few extreme data points)
Limits in range if score (can’t generalize beyond range of scores in sample) Liw correlation may be due to limited range.
Bias =
Mean difference
Which correlation coefficient?
All data ratio/interval
Pearson r
LOA
Limits of agreements
Range include ~95% of differences
Case-control and cohort studies are of ____ design, and intend to study ___. Generally IV and DV are ___ variables
Exploratory design
Intended to study risk factors (assoc between disease and exposure)
Both IV and DV dichotomous
Continuous (interval/ratio) reliability coefficients
Pearson correlation (r) Intraclass correlation coefficient (ICC)- better
Outliers effect on regression line
Outliers/deviant scores have large effect on regression line
Which correlation coefficient?
1 nominal dichotomy, 1 ordinal
Rank biserial
MANOVA: ___ DV, ___ groups
2 or more DV
3 or more groups
MANOVA combines multiple DVs into 1 “combo DV”
Cohen’s kappa coefficients used for _____
Categorical scale scores
Weighted kappa best for ____. Weights can be ___ and ____.
Can choose to make “penalty” ____ for ___
Best for ordinal data
Weights can be arbitrary, symmetric or asymmetric
Penalty worse for larger disagreements
Interpreting relative risk/odds ratios
RR < 1 suggests protective
RR > 1 suggests harmful (positive association)
RR = 1 null/ no association
If 95% CI includes 1 = not significant
If 95% CI excludes 1 = significant
Chi-square:
P-value > 0.05 association not significant
P-value < 0.05 association significant
Epidemiology generally uses ___ designs with ___ variables
Observational design Dichotomous variables (disease or no disease/ exposed or unexposed)
Significance of coefficient: p-value and CI
Null hypothesis: the correlation between variable X and variable Y is not significantly different from zero. Ho: r=0
Very sensitive to sample size Trivial coefficients (r=0.1 to 0.2) are often statistically significant if sample large enough
2 related scores
Parametric and Nonparametric tests
Parametric: paired t-test
Nonparametric: Wilcoxon signed-ranks test (T)
Sign test
Percent agreement is simply ____. Calculate by…
How often raters agree
Divide number of agreements by total of all possible agreements
Correlation
Number of IV and DV
IV = 1 DV = 1
Rank biserial correlation (r rb)
1 variable dichotomous (nominal), other variable ordinal
Computationally about same as Spearman’s rank
Ex: gender vs MMT
Results same as Mann-Whitney U-test
ICC interpretation p-value tests whether
Point estimate is statistically different from 0
Stated in terms of variance, reliability =
True score reliability
_____________________________
(True score variability + error variability)
ICC model 3
Ea subject measures by same rater(s);
Raters are only ones of interest
Most common for intra-rater reliability
Can be for inter-rater reliability if study raters only ones of interest
Most common correlation coefficient
Pearson product-moment correlation coefficient (r)
ICC give ______ estimate of reliability (ie no units) and often reported in conjunction with
“Standardized”
SEM
Relative Risk
RR= incidence of disease in exposed individuals/ incidence of disease among unexposed individuals
Used in cohort studies
Quantify strength of association between exposure and disease
2x2 table
The first number in ICC type is __ the second number is ___
Model
Form
ANOVA:
IV Level of measurement
DV level of measurement
Question
IV: nominal
DV: continuous
Q: difference between means?
Linear regression
Number of IV and DV
IV = 1 DV = 1
Correlation coefficient (R) for regression
Rough indicator of goodness of good fit for regression line
Same as correlation coefficient (r)
Accuracy of prediction
Visual modeling of both direct and indirect relationships.
Can analyze both direct and indirect relationships between….
Path analysis
Can analyze both direct and indirect relationships between
1 or more exogenous variables (IV)
1 or more endogenous variables (DV)
ANOVA: ____ DV, ____ groups
1 DV, 3 or more groups
ICC forms
2nd number in parentheses represents number of observations used to obtain reliability estimate
SEM (std error if measurement) is _____ measure of reliability.
It is ______
Absolute
Standard deviation of the distribution of theoretical multiple measurements
It is mathematical multiple of ICC
Odds ratio
OR= odds of exposure among cases (w/ disease) / odds of exposure among controls (w/o disease)
Used in case-control studies
Quantify strength of association between exposure and disease
2x2 table
Regression:
IV Level of measurement
DV level of measurement
Question
IV: continuous
DV: continuous
Q: strength of prediction?
Observed score is
True score +/- error
ICC interpretation that is “best for clinical measurements”
ICC > 0.90
T-tests: number of IV and DV
IV = 1 DV = 1
Correlation coefficients:
Sign indicates ____.
(+) (-)
____ means higher coefficient
Direction
+ 1.00 = perfect line: graphed bottom L to top R
- 1.00 = perfect line: graphed top L to bottom R
Tighter grouping means higher coefficient
Reliability for categorical scales based on ______. Agreements are ___ and disagreements are ___.
Frequency table
Agreements on diagonal
Disagreements are all others
ICC model 2
Ea subject measures by same raters; raters randomly chosen and representative of rater population
Results generalize
Most common for inter-rater reliability or test-retest reliability
Cohort studies. Subjects selected based on ____. Usually ___, but can be ____. Examine ___. Doesn’t work well for ___.
Subjects selected based on exposure or not.
Usually prospective, but can be retrospective
Examine if different incidence or disease
Doesn’t work well for rare conditions
Nonparametric tests are unable to be performed on…
Complex designs like 2x3
Stepwise procedures in multiple regression models
Criteria set to retain or reject predictors
Predictor with highest partial correlation entered first
Others added/removed in sequence Deleon criteria
Should result in model with greatest parsimony and least multicolinearity
ICC (intraclass correlation coefficients) used for
Continuous scale scores
But can be used for original data if intervals “assumed” to be equivalent (like a pain scale)
Pro and con of MANOVA
Pros:
Gets around multiplicity problem (increased type 1 error risk)
Can be more powerful if DVs related
Cons:
“Combo DV” is not directly interpretable
If statistically significant, must follow up with post-hoc ANOVAs
Regression coefficient (B)
Value/slope in linear equation
Rate of change in Y for each unit change of X
Accuracy of prediction
Ratio of std deviation to mean, expressed as a percentage
CV coefficient of variation
Covariance means
As one changes, the other also changes
ICC Form 1
Only 1 observation per subject per rater (or rating)
Problem with percent agreement
Does not account for agreement due to chance
Tends to overestimate reliability
Multiple linear regression
Number of IV And DV
IV = more than 1 DV = 1
2 independent groups
Parametric and Nonparametric tests
Parametric: unpaired t-test
Nonparametric: Mann-Whitney U test
Correlation:
IV Level of measurement
DV level of measurement
Question
IV: continuous
DV: continuous
Q: strength of association?
Unpaired t-test
Parametric
2 independent groups
1 IV with 2 levels
Mann-Whitney U test
Dependent variable Ordinal
Can not assume normal distribution
2 independent groups
1 IV with 2 levels
3 or more related scores
Parametric and Nonparametric tests
Parametric: 1-way repeated measures analysis of variance (F)
Nonparametric: Friedman 2-way analysis of variance by ranks (x^2r)
Kappa coefficient is proportion of agreement ____
Between raters after chance agreement has been removed
Receiver operating characteristics (ROC) used to
Find cut off scores (dichotomous data)
__ types of ICC depending on
6 ICC types Depends on: Purpose of study Design of study Type of measurements
Odds ratio and case control studies are selected based on ____, so cant determine ___
Selected based on whether they have disease or not,
Can’t determine rate of incidence
Wilcoxon sign-ranks test
Dependent variable Ordinal
Can not assume normal distribution
1 group
1 IV with 2 levels
____ designs are aimed at finding relationships
Exploratory designs
Ex: case-control, cohort, predictive, methodological validity, historical, secondary analysis
Logistic regression
Number of IV and DV
IV = more than 1 DV = 1
ICC interpretation showing “poor to moderate reliability”
ICC < 0.75
Case-Control studies. Subjects selected based on \_\_\_. Controls selected from \_\_\_. Examine if \_\_\_\_. Works especially well for \_\_.
Subjects selected based on whether or not they have the disorder
Control ms should be from same population as cases
Examine if exposure different between cases and controls
Works especially well for very rare conditions
Typically retrospective
Aimed at studying determinants of disease, injury, or dysfunction in populations (risk)
Epidemiology
Recommended that Cronbach’s alpha be between __
0.70 to 0.90
Correlation coefficients _____ and vary between ___ and ____.
Quantify linear relationships
0 and +/- 1.00
Multicolinearity data
Correlation table
Want to be high and significant,
And others be low nonsignificant
Causation statements come from ____
Controlled experiments (RCTs)
T-test:
IV Level of measurement
DV level of measurement
Question
IV : nominal
DV: continuous
Q: difference between means?
Method of simplifying and organizing large sets of variables into fewer abstract components
Factor analysis
Pearson product-moment correlation coefficient applicable when variables are ___ or ___.
Interval or ratio (continuous)
Extent to which a measurement is free from error
Reliability
Linear regression, X is __ and ___ is __
X = IV = “predictor” variable Y = DV = criterion variable
X and Y are correlated
Correlation does NOT
Assess differences or agreement
ICCs do
Nonparametric tests require a ___ sample size compared to parametric
Larger
A large number of predictors require ___, rule of thumb is __.
Too many predictors or too few subjects, becomes susceptible to ___
Very large sample size
10-15 people per predictor in model
Too many predictors or too few subjects - susceptible to model overfit (chance, type 1 error)
Cronbach’s alpha can help eliminate ____
Items from tests/questionnaires that are not homogeneous to the set or are not contributing unique info
Which correlation coefficient?
All ordinal
Spearman’s rho
Simple linear Regression model based on
Line that best fits data
Slope of line equation Y = a + bX
b is slope of line
a is y-intercept
Y= DV and X=IV
The slope (b) is the regression coefficient
3 or more independent groups
Parametric and Nonparametric tests
Parametric: 1-way analysis of variance (F)
Nonparametric: Kruskal–Wallis analysis of variance by ranks (H or x^2)