Week 9 - Multivariate Stats Flashcards

1
Q

what is univariate data?

A

analysis of single variables, i.e. descriptive measures of central tendency, such as the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is bivariate data

A

analysis of two variables to assess the empiric relationship between them (this may include up to looking at 3+ treatment groups on 2 different levels of the IV such as the 2-way ANOVA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is multivariate data

A

analysis of three or more variables to assess the empiric relationship between them (this may include various types of predictive modeling such as MLR plus multivariate ANOVAs such as ANCOVA, MANOVA, MANCOVA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is multivariate analysis (MVA)

A

Statistical procedures for analyzing inter-relationships among three or more variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the pros of multivariate analysis

A

allows you to identify, quantify and de-tangle complex relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the cons of multivariate analysis

A

time consuming, computationally formidable and sophisticated…and not easily understood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what makes multivariate modeling computationally formidable?

A
  • Not an easy task, usually requires that a data set conforms to complex assumptions and requirements
  • Extensive preprocessing
  • Depending on what’s involved—requires costly statistical packages, protected time and advanced skill sets which requires previous training and direct experience
  • Missing or minimizing any of the required assumptions or pre-processing steps, or failure to perform post-hoc comparison analyses and confirmation/validation; etc will result in incorrect models that produce false and unreliable results;
  • Complete understanding of the technique/procedure and use of the statistical package involved is an absolute requirement—otherwise findings can be disastrously misinterpreted. *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how do you determine which multivariate analysis technique to use

A
  • Depends on several factors
  • Size, quality, structure and the nature of available data
  • What questions are you trying to answer (what is the task)?
  • Urgency of the task
  • Nature of any design, data corrections, imputations, weighting required
  • Availability of computational time, resources, and content experts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the basic approach to model building

A
  • define the research problem, objective and multivariate techniques used
  • interpret the model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is simple linear regression

A

-Predicts one DV from one IV
-Straight-line fit to data that minimizes deviations from the line
-R value
-R2 = how much variance in the DV is accounted for by the IV
-predict the values of one variable based on values of a second variable
-Estimates a straight-line fit to the data that minimize deviations from the line
Y′ = a + bX
Y′ = predicted value of variable Y (DV)
a = intercept constant
b = regression coefficient (slope of the line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is an example of simple linear regression

A

Can height (X) predict weight (Y)?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

give a graphical solution for simple linear regression

A
  • This plot is different from r, the correlation coefficient
  • r expresses how variation in 1 variable is associated with another
  • R2 tells us proportion of variance in Y(DV) that is accounted for by X
  • Line is the regression solution equation for X and Y values
  • The stronger the r that exists b/t X and Y,
  • Better prediction from X;
  • Greater % variance (R2) explained in Y (DV)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is multiple linear regression

A
  • Predicts one DV from more than one IV
  • These variables have to be interval or ratio
  • R value (the multiple correlation coefficient)
  • R2 = proportion of variance in DV accounted for by simultaneous impact of all IVs
  • A method of predicting a continuous dependent variable based on two or more independent (predictor) variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the key difference between linear regression and multiple linear regression

A

linear regression has a single predictor whereas multiple linear regression has multiple predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is an example of multiple linear regression

A

Can height (X1) and max jump height (X2) predict weight (Y)?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the multiple linear regression equation with 2 predictor variables

A
Y′ = a + b1X1 + b2X2
Where Y′ = predicted value of variable Y
	(dependent variable)
a = intercept constant
b1 = regression coefficient for variable X1
X1 = actual value of variable X1 
b2 = regression coefficient for variable X2
X2 = actual value of variable X2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

multiple linear regression testing for overall model equation

A

F-statistic (but t’s can be used, depends on RQ)
H0 : β1 = β2 = β3 = 0
HA : At least one βj ≠ 0 (for j = 1, 2, 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

tests of regression coefficients (b)

A

t-tests
H0 : β1 = 0
HA : β1 ≠ 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is multiple correlation coefficient

A

With 2 or more IVs, instead of r (correlation coefficient) the index is the multiple correlation coefficient R (.0-1.0)
Shows strength not direction
R2 = proportion of variance in Y accounted for by combined simultaneous influence of all predictors
Cannot be less than the highest bivariate correlation b/t DV and IV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what are the types of multiple linear regression

A
  • simultaneous
  • hierarchical
  • elimination (backward deletion)
  • stepwise (forward)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

define simultaneous as a MLR type

A

All predictors added to the equation all at once up front

Also called ‘direct entry’, ‘forced entry’

22
Q

define hierarchical as a MLR type

A

Predictors are added in a series of steps based on prior theoretical knowledge
Also called ‘forward blocks’, ‘block-wise’

23
Q

define elimination (backward deletion) in MLR type

A

Reverse of the forward stepwise; place all IV predictors in up front and remove them 1 at a time if they do not contribute to overall equation; evaluate fit at each step

24
Q

define stepwise (forward) as type of MLR

A

Sequentially add predictors based on ordering the IVs according to their predictive power; evaluate fit at each step
Depending on the application: Forward Stepwise may be controversial because it places statistical thresholds before the theoretical importance of the predictors

25
Q

how do you interpret MLR results

A
  • Difficult to extrapolate meaning from raw scores (constant a, correlation coefficient b) or scores with different units of measurement
  • Address this problem by using standard scores (i.e. z scores) and coefficients (β Weights)
  • Now everything is standardized (all in the same metric) and you can interpret relative contributors to the model equation
26
Q

what results from an inadequate sample size in MLR

A
  • type ll error

- erratic β

27
Q
You are an older adult provider and wish to predict the short-term memory level in patients with dementia by using a short-term memory tool or instrument. Your model includes age, number of medications, thyroid levels, vitamin D levels, number of hours of sleep and average minutes of physical activity per day.  Which statistical test would be most appropriate to utilize in this scenario?
ANCOVA
Logistic regression
Multiple regression
MANOVA
A

Multiple regression

28
Q

what is logistic regression

A

Predicts the likelihood of something happening; Trying to figure out what group a person will end up with based on a multitude of variables of any type; How likely is it that an even will happen based on the different variables

29
Q

what is an example of a binary (bimodal) LR for DV’s 2 categories

A

Weather: Sunny or Rainy

30
Q

what is an example of multinonial LR for DV’s with 3+ categories

A

Example: Robust, Prefrailty, Frailty

31
Q

what is odds ratio in logistic regression

A

Where the odds of doing something equal the probability of doing it divided by the probability of not doing it

32
Q

explain DV and IV in logistic regression

A

DV must be nominal level - usually dichotomous

IV - categorical, interval, continuous, interaction, various levels

33
Q

what are examples of multinomial LR for DV

A

DV = frailty (3 levels)
Multinomial
Predicting PF, F

34
Q

what are examples of multinomial LR for IV

A
IV = multiple
Gender
Age category
SES status
Smoking status
ETOH use
BMI 
Comorbidities
35
Q
A researcher wishes to predict whether adolescents with pre-diabetes will become type 2 diabetics.  The variables utilized to predict type 2 diabetes mellitus (Type 2 DM) are current weight, fasting blood glucose, grams of carbohydrates consumed per day, family member with Type 2 DM and minutes of exercise per day.  Which statistical test would be the most appropriate to use?
ANCOVA
Logistic regression
Multiple linear regression
MANOVA
A

Logistic regression

36
Q

what is ANCOVA

A
  • Analysis of Covariance (CoV)
  • Extension of ANOVA by removing effect of extraneous variables (CoV) before testing whether mean group differences are statistically significant
  • 2 or more groups, interval or ratio level data
  • Tells you how an IV acts on the DV while removing effects of CoVs
  • F-statistic test w/adjusted group means (omnibus)
  • Significant group F-test means researchers can reject the H0 that adjusted group means are equal
  • Post-hoc tests are required to determine where differences are located
37
Q

ANCOVA Example: Does a new drug work for HTN?

A
RQ: Controlling for X (CoV), what is the effect of 3 treatment groups on the SBP (DV)?
Low Dose BID
Moderate Dose BID
High Dose QD
Control group (Placebo)

Performing an ANOVA on DV SBP may tell you if the treatment works

ANCOVA would control for additional factors that may influence the outcome such as: family life, job status, drug use

38
Q

what is MANOVA

A

Tests for significant differences in two or more groups on 2 or more (interval or ratio level) DV outcomes simultaneously (i.e. SBP and DBP)

39
Q

what is MANCOVA

A

More power than separate 1-way ANCOVAs; can be 1 or 2-way

40
Q

what is a 1-way MANCOVA

A

1 IV w/2+ levels (independent groups), 2+ DVs (interval, ratio); 1+ CoVs (interval, ratio)

41
Q

what is a 2-way MANCOVA

A

2 IV w/2+ levels (independent groups), 2+ DVs (interval, ratio) and 1+ CoV (interval, ratio)

42
Q

MANOVA Example: Prostate Cancer

A

RQ: What is the effect of radiation only compared to radiation plus hormone therapy treatment on tumor size and blood PSA concentration?
The H0 states that the mean tumor size is equal for all treatment groups, and the blood PSA concentration is also equal across the treatment groups.

43
Q
You are conducting a study comparing mean depression scores among widowed men with varying degrees of social support and would like to control for age.  What statistical test would be the most appropriate to use?
ANOVA
ANCOVA
Logistic regression
Simple linear regression
A

ANCOVA

44
Q
You notice that many of your older obstetrical patients are developing multiple cardiac risk factors after the post-partum period.  You wish to test the impact of a new behavioral intervention, compared to standard caloric restriction, on BMI, resting heart rate, blood pressure, and serum cholesterol levels.  Which statistical test would be the most appropriate to use?
ANCOVA
Logistic regression
Multiple regression
MANOVA
A

MANOVA

45
Q
Capillary blood glucose levels were studied in 4 groups of patients with diabetes who had varying social support systems. Fasting glucose levels were obtained for each group member using a calibrated glucose meter. Subsequently, the mean glucose levels for the groups were compared. 
Student’s (Independent) t test
Paired (Dependent) t test
Kruskal Wallis
Pearson’s r
ANOVA
A

ANOVA
RATIONALE: Dependent variable is glucose level, which is ratio level data and 4 groups are being compared or measured one time only.

46
Q
You notice that many of your older obstetrical patients are developing multiple cardiac risk factors after the post-partum period. You wish to test the impact of a new behavioral intervention, compared to standard caloric restriction, on BMI, resting heart rate, blood pressure, and serum cholesterol levels.  Which statistical test would be the most appropriate to use?
ANCOVA
Logistic Regression
Multiple Linear Regression
MANOVA
A

MANOVA
RATIONALE:
Two or more IVs: age, new behavioral intervention, standard caloric restriction;
Multiple DV outcomes simultaneously: BMI, resting heart rate, blood pressure, and serum cholesterol levels.

47
Q
You are conducting a study comparing mean depression scores among community dwelling older adults with varying degrees of social support and would like to control for age. What statistical test would be the most appropriate to use?
ANOVA
ANCOVA
Logistic Regression
Simple Linear Regression
A

ANCOVA
RATIONALE:
Mean depression scores (interval) DV; and varying groups of social support (categorical) IV;
Controlling for age (CoV), (ratio).

48
Q
A researcher hypothesized that there would be no difference in respiratory rates for patients who self-administered respiratory treatments compared to respiratory treatments administered by a health care provider. Which would be the most appropriate statistical test?
Paired (dependent) t test
Pearson’s r 
Mann Whitney U
Independent (Student’s) t test
None of the above
A

Independent (Student’s) t test
RATIONALE:
DV is respiratory rates, which is ratio level data; only 2 groups are being compared at 1 time point.

49
Q
You are an older adult HCP and wish to predict the short-term memory level in patients with dementia by using a short-term memory tool or instrument. Your model includes age, number of medications, thyroid levels, vitamin D levels, number of hours of sleep and  average minutes of physical activity per day. Which statistical test would be most appropriate to utilize in this scenario?
ANCOVA
Logistic Regression 
Multiple Regression
MANCOVA
None of the above
A

Multiple Regression
RATIONALE:
Note the term “predict” (also can be association, predictive association);
Predict DV of short-term memory level (interval data) by using multiple IVs that are interval, ratio level.

50
Q
A researcher wishes to predict whether adolescents with pre-diabetes will become type 2 diabetics. The variables utilized to predict type 2 diabetes mellitus (Type 2 DM)  are current weight, fasting blood glucose, grams of carbohydrates consumed per day, family member with Type 2 DM and minutes of exercise per day. Which statistical test would be the most appropriate to use?
ANCOVA
Logistic Regression
Multiple Linear Regression
MANOVA
A

Logistic Regression
RATIONALE:
Goal is for the prediction of a nominal level DV, Type 2 DM presence.
Multiple IVs of varying levels of measurement.

51
Q

An investigation was undertaken to determine if there was a relationship between the quantity of coronary artery blockage (mm) and fasting serum cholesterol (mg/dL) levels in obese Caucasian males undergoing cardiac catheterization during January of 2018. The results indicate a strong positive correlation, r = .88, p = .001. What statistical test would you suspect was used to analyze these data?

a. ) Spearman rho
b. ) Independent t-test, student’s t-test
c. ) Mann-Whitney U test
d. ) Pearson’s r
e. ) RM-ANOVA

A

d.) Pearson’s r
RATIONALE:
To ‘determine’ the relationship between the variables: cholesterol level (ratio level data) and quantity of coronary artery blockage (ratio level data)