Week 9 - Multivariate Stats Flashcards
what is univariate data?
analysis of single variables, i.e. descriptive measures of central tendency, such as the mean
what is bivariate data
analysis of two variables to assess the empiric relationship between them (this may include up to looking at 3+ treatment groups on 2 different levels of the IV such as the 2-way ANOVA)
what is multivariate data
analysis of three or more variables to assess the empiric relationship between them (this may include various types of predictive modeling such as MLR plus multivariate ANOVAs such as ANCOVA, MANOVA, MANCOVA)
what is multivariate analysis (MVA)
Statistical procedures for analyzing inter-relationships among three or more variables
what are the pros of multivariate analysis
allows you to identify, quantify and de-tangle complex relationships
what are the cons of multivariate analysis
time consuming, computationally formidable and sophisticated…and not easily understood
what makes multivariate modeling computationally formidable?
- Not an easy task, usually requires that a data set conforms to complex assumptions and requirements
- Extensive preprocessing
- Depending on what’s involved—requires costly statistical packages, protected time and advanced skill sets which requires previous training and direct experience
- Missing or minimizing any of the required assumptions or pre-processing steps, or failure to perform post-hoc comparison analyses and confirmation/validation; etc will result in incorrect models that produce false and unreliable results;
- Complete understanding of the technique/procedure and use of the statistical package involved is an absolute requirement—otherwise findings can be disastrously misinterpreted. *
how do you determine which multivariate analysis technique to use
- Depends on several factors
- Size, quality, structure and the nature of available data
- What questions are you trying to answer (what is the task)?
- Urgency of the task
- Nature of any design, data corrections, imputations, weighting required
- Availability of computational time, resources, and content experts
what is the basic approach to model building
- define the research problem, objective and multivariate techniques used
- interpret the model
what is simple linear regression
-Predicts one DV from one IV
-Straight-line fit to data that minimizes deviations from the line
-R value
-R2 = how much variance in the DV is accounted for by the IV
-predict the values of one variable based on values of a second variable
-Estimates a straight-line fit to the data that minimize deviations from the line
Y′ = a + bX
Y′ = predicted value of variable Y (DV)
a = intercept constant
b = regression coefficient (slope of the line)
what is an example of simple linear regression
Can height (X) predict weight (Y)?
give a graphical solution for simple linear regression
- This plot is different from r, the correlation coefficient
- r expresses how variation in 1 variable is associated with another
- R2 tells us proportion of variance in Y(DV) that is accounted for by X
- Line is the regression solution equation for X and Y values
- The stronger the r that exists b/t X and Y,
- Better prediction from X;
- Greater % variance (R2) explained in Y (DV)
what is multiple linear regression
- Predicts one DV from more than one IV
- These variables have to be interval or ratio
- R value (the multiple correlation coefficient)
- R2 = proportion of variance in DV accounted for by simultaneous impact of all IVs
- A method of predicting a continuous dependent variable based on two or more independent (predictor) variables
what is the key difference between linear regression and multiple linear regression
linear regression has a single predictor whereas multiple linear regression has multiple predictors
what is an example of multiple linear regression
Can height (X1) and max jump height (X2) predict weight (Y)?
what is the multiple linear regression equation with 2 predictor variables
Y′ = a + b1X1 + b2X2 Where Y′ = predicted value of variable Y (dependent variable) a = intercept constant b1 = regression coefficient for variable X1 X1 = actual value of variable X1 b2 = regression coefficient for variable X2 X2 = actual value of variable X2
multiple linear regression testing for overall model equation
F-statistic (but t’s can be used, depends on RQ)
H0 : β1 = β2 = β3 = 0
HA : At least one βj ≠ 0 (for j = 1, 2, 3)
tests of regression coefficients (b)
t-tests
H0 : β1 = 0
HA : β1 ≠ 0
what is multiple correlation coefficient
With 2 or more IVs, instead of r (correlation coefficient) the index is the multiple correlation coefficient R (.0-1.0)
Shows strength not direction
R2 = proportion of variance in Y accounted for by combined simultaneous influence of all predictors
Cannot be less than the highest bivariate correlation b/t DV and IV
what are the types of multiple linear regression
- simultaneous
- hierarchical
- elimination (backward deletion)
- stepwise (forward)
define simultaneous as a MLR type
All predictors added to the equation all at once up front
Also called ‘direct entry’, ‘forced entry’
define hierarchical as a MLR type
Predictors are added in a series of steps based on prior theoretical knowledge
Also called ‘forward blocks’, ‘block-wise’
define elimination (backward deletion) in MLR type
Reverse of the forward stepwise; place all IV predictors in up front and remove them 1 at a time if they do not contribute to overall equation; evaluate fit at each step
define stepwise (forward) as type of MLR
Sequentially add predictors based on ordering the IVs according to their predictive power; evaluate fit at each step
Depending on the application: Forward Stepwise may be controversial because it places statistical thresholds before the theoretical importance of the predictors
how do you interpret MLR results
- Difficult to extrapolate meaning from raw scores (constant a, correlation coefficient b) or scores with different units of measurement
- Address this problem by using standard scores (i.e. z scores) and coefficients (β Weights)
- Now everything is standardized (all in the same metric) and you can interpret relative contributors to the model equation
what results from an inadequate sample size in MLR
- type ll error
- erratic β
You are an older adult provider and wish to predict the short-term memory level in patients with dementia by using a short-term memory tool or instrument. Your model includes age, number of medications, thyroid levels, vitamin D levels, number of hours of sleep and average minutes of physical activity per day. Which statistical test would be most appropriate to utilize in this scenario? ANCOVA Logistic regression Multiple regression MANOVA
Multiple regression
what is logistic regression
Predicts the likelihood of something happening; Trying to figure out what group a person will end up with based on a multitude of variables of any type; How likely is it that an even will happen based on the different variables
what is an example of a binary (bimodal) LR for DV’s 2 categories
Weather: Sunny or Rainy
what is an example of multinonial LR for DV’s with 3+ categories
Example: Robust, Prefrailty, Frailty
what is odds ratio in logistic regression
Where the odds of doing something equal the probability of doing it divided by the probability of not doing it
explain DV and IV in logistic regression
DV must be nominal level - usually dichotomous
IV - categorical, interval, continuous, interaction, various levels
what are examples of multinomial LR for DV
DV = frailty (3 levels)
Multinomial
Predicting PF, F
what are examples of multinomial LR for IV
IV = multiple Gender Age category SES status Smoking status ETOH use BMI Comorbidities
A researcher wishes to predict whether adolescents with pre-diabetes will become type 2 diabetics. The variables utilized to predict type 2 diabetes mellitus (Type 2 DM) are current weight, fasting blood glucose, grams of carbohydrates consumed per day, family member with Type 2 DM and minutes of exercise per day. Which statistical test would be the most appropriate to use? ANCOVA Logistic regression Multiple linear regression MANOVA
Logistic regression
what is ANCOVA
- Analysis of Covariance (CoV)
- Extension of ANOVA by removing effect of extraneous variables (CoV) before testing whether mean group differences are statistically significant
- 2 or more groups, interval or ratio level data
- Tells you how an IV acts on the DV while removing effects of CoVs
- F-statistic test w/adjusted group means (omnibus)
- Significant group F-test means researchers can reject the H0 that adjusted group means are equal
- Post-hoc tests are required to determine where differences are located
ANCOVA Example: Does a new drug work for HTN?
RQ: Controlling for X (CoV), what is the effect of 3 treatment groups on the SBP (DV)? Low Dose BID Moderate Dose BID High Dose QD Control group (Placebo)
Performing an ANOVA on DV SBP may tell you if the treatment works
ANCOVA would control for additional factors that may influence the outcome such as: family life, job status, drug use
what is MANOVA
Tests for significant differences in two or more groups on 2 or more (interval or ratio level) DV outcomes simultaneously (i.e. SBP and DBP)
what is MANCOVA
More power than separate 1-way ANCOVAs; can be 1 or 2-way
what is a 1-way MANCOVA
1 IV w/2+ levels (independent groups), 2+ DVs (interval, ratio); 1+ CoVs (interval, ratio)
what is a 2-way MANCOVA
2 IV w/2+ levels (independent groups), 2+ DVs (interval, ratio) and 1+ CoV (interval, ratio)
MANOVA Example: Prostate Cancer
RQ: What is the effect of radiation only compared to radiation plus hormone therapy treatment on tumor size and blood PSA concentration?
The H0 states that the mean tumor size is equal for all treatment groups, and the blood PSA concentration is also equal across the treatment groups.
You are conducting a study comparing mean depression scores among widowed men with varying degrees of social support and would like to control for age. What statistical test would be the most appropriate to use? ANOVA ANCOVA Logistic regression Simple linear regression
ANCOVA
You notice that many of your older obstetrical patients are developing multiple cardiac risk factors after the post-partum period. You wish to test the impact of a new behavioral intervention, compared to standard caloric restriction, on BMI, resting heart rate, blood pressure, and serum cholesterol levels. Which statistical test would be the most appropriate to use? ANCOVA Logistic regression Multiple regression MANOVA
MANOVA
Capillary blood glucose levels were studied in 4 groups of patients with diabetes who had varying social support systems. Fasting glucose levels were obtained for each group member using a calibrated glucose meter. Subsequently, the mean glucose levels for the groups were compared. Student’s (Independent) t test Paired (Dependent) t test Kruskal Wallis Pearson’s r ANOVA
ANOVA
RATIONALE: Dependent variable is glucose level, which is ratio level data and 4 groups are being compared or measured one time only.
You notice that many of your older obstetrical patients are developing multiple cardiac risk factors after the post-partum period. You wish to test the impact of a new behavioral intervention, compared to standard caloric restriction, on BMI, resting heart rate, blood pressure, and serum cholesterol levels. Which statistical test would be the most appropriate to use? ANCOVA Logistic Regression Multiple Linear Regression MANOVA
MANOVA
RATIONALE:
Two or more IVs: age, new behavioral intervention, standard caloric restriction;
Multiple DV outcomes simultaneously: BMI, resting heart rate, blood pressure, and serum cholesterol levels.
You are conducting a study comparing mean depression scores among community dwelling older adults with varying degrees of social support and would like to control for age. What statistical test would be the most appropriate to use? ANOVA ANCOVA Logistic Regression Simple Linear Regression
ANCOVA
RATIONALE:
Mean depression scores (interval) DV; and varying groups of social support (categorical) IV;
Controlling for age (CoV), (ratio).
A researcher hypothesized that there would be no difference in respiratory rates for patients who self-administered respiratory treatments compared to respiratory treatments administered by a health care provider. Which would be the most appropriate statistical test? Paired (dependent) t test Pearson’s r Mann Whitney U Independent (Student’s) t test None of the above
Independent (Student’s) t test
RATIONALE:
DV is respiratory rates, which is ratio level data; only 2 groups are being compared at 1 time point.
You are an older adult HCP and wish to predict the short-term memory level in patients with dementia by using a short-term memory tool or instrument. Your model includes age, number of medications, thyroid levels, vitamin D levels, number of hours of sleep and average minutes of physical activity per day. Which statistical test would be most appropriate to utilize in this scenario? ANCOVA Logistic Regression Multiple Regression MANCOVA None of the above
Multiple Regression
RATIONALE:
Note the term “predict” (also can be association, predictive association);
Predict DV of short-term memory level (interval data) by using multiple IVs that are interval, ratio level.
A researcher wishes to predict whether adolescents with pre-diabetes will become type 2 diabetics. The variables utilized to predict type 2 diabetes mellitus (Type 2 DM) are current weight, fasting blood glucose, grams of carbohydrates consumed per day, family member with Type 2 DM and minutes of exercise per day. Which statistical test would be the most appropriate to use? ANCOVA Logistic Regression Multiple Linear Regression MANOVA
Logistic Regression
RATIONALE:
Goal is for the prediction of a nominal level DV, Type 2 DM presence.
Multiple IVs of varying levels of measurement.
An investigation was undertaken to determine if there was a relationship between the quantity of coronary artery blockage (mm) and fasting serum cholesterol (mg/dL) levels in obese Caucasian males undergoing cardiac catheterization during January of 2018. The results indicate a strong positive correlation, r = .88, p = .001. What statistical test would you suspect was used to analyze these data?
a. ) Spearman rho
b. ) Independent t-test, student’s t-test
c. ) Mann-Whitney U test
d. ) Pearson’s r
e. ) RM-ANOVA
d.) Pearson’s r
RATIONALE:
To ‘determine’ the relationship between the variables: cholesterol level (ratio level data) and quantity of coronary artery blockage (ratio level data)