Week 9 - Multivariate Stats Flashcards
what is univariate data?
analysis of single variables, i.e. descriptive measures of central tendency, such as the mean
what is bivariate data
analysis of two variables to assess the empiric relationship between them (this may include up to looking at 3+ treatment groups on 2 different levels of the IV such as the 2-way ANOVA)
what is multivariate data
analysis of three or more variables to assess the empiric relationship between them (this may include various types of predictive modeling such as MLR plus multivariate ANOVAs such as ANCOVA, MANOVA, MANCOVA)
what is multivariate analysis (MVA)
Statistical procedures for analyzing inter-relationships among three or more variables
what are the pros of multivariate analysis
allows you to identify, quantify and de-tangle complex relationships
what are the cons of multivariate analysis
time consuming, computationally formidable and sophisticated…and not easily understood
what makes multivariate modeling computationally formidable?
- Not an easy task, usually requires that a data set conforms to complex assumptions and requirements
- Extensive preprocessing
- Depending on what’s involved—requires costly statistical packages, protected time and advanced skill sets which requires previous training and direct experience
- Missing or minimizing any of the required assumptions or pre-processing steps, or failure to perform post-hoc comparison analyses and confirmation/validation; etc will result in incorrect models that produce false and unreliable results;
- Complete understanding of the technique/procedure and use of the statistical package involved is an absolute requirement—otherwise findings can be disastrously misinterpreted. *
how do you determine which multivariate analysis technique to use
- Depends on several factors
- Size, quality, structure and the nature of available data
- What questions are you trying to answer (what is the task)?
- Urgency of the task
- Nature of any design, data corrections, imputations, weighting required
- Availability of computational time, resources, and content experts
what is the basic approach to model building
- define the research problem, objective and multivariate techniques used
- interpret the model
what is simple linear regression
-Predicts one DV from one IV
-Straight-line fit to data that minimizes deviations from the line
-R value
-R2 = how much variance in the DV is accounted for by the IV
-predict the values of one variable based on values of a second variable
-Estimates a straight-line fit to the data that minimize deviations from the line
Y′ = a + bX
Y′ = predicted value of variable Y (DV)
a = intercept constant
b = regression coefficient (slope of the line)
what is an example of simple linear regression
Can height (X) predict weight (Y)?
give a graphical solution for simple linear regression
- This plot is different from r, the correlation coefficient
- r expresses how variation in 1 variable is associated with another
- R2 tells us proportion of variance in Y(DV) that is accounted for by X
- Line is the regression solution equation for X and Y values
- The stronger the r that exists b/t X and Y,
- Better prediction from X;
- Greater % variance (R2) explained in Y (DV)
what is multiple linear regression
- Predicts one DV from more than one IV
- These variables have to be interval or ratio
- R value (the multiple correlation coefficient)
- R2 = proportion of variance in DV accounted for by simultaneous impact of all IVs
- A method of predicting a continuous dependent variable based on two or more independent (predictor) variables
what is the key difference between linear regression and multiple linear regression
linear regression has a single predictor whereas multiple linear regression has multiple predictors
what is an example of multiple linear regression
Can height (X1) and max jump height (X2) predict weight (Y)?
what is the multiple linear regression equation with 2 predictor variables
Y′ = a + b1X1 + b2X2 Where Y′ = predicted value of variable Y (dependent variable) a = intercept constant b1 = regression coefficient for variable X1 X1 = actual value of variable X1 b2 = regression coefficient for variable X2 X2 = actual value of variable X2
multiple linear regression testing for overall model equation
F-statistic (but t’s can be used, depends on RQ)
H0 : β1 = β2 = β3 = 0
HA : At least one βj ≠ 0 (for j = 1, 2, 3)
tests of regression coefficients (b)
t-tests
H0 : β1 = 0
HA : β1 ≠ 0
what is multiple correlation coefficient
With 2 or more IVs, instead of r (correlation coefficient) the index is the multiple correlation coefficient R (.0-1.0)
Shows strength not direction
R2 = proportion of variance in Y accounted for by combined simultaneous influence of all predictors
Cannot be less than the highest bivariate correlation b/t DV and IV
what are the types of multiple linear regression
- simultaneous
- hierarchical
- elimination (backward deletion)
- stepwise (forward)