Lecture 8 (MULTIVARIATE REGRESSION ANALYSIS) Flashcards
MULTIPLE REGRESSION ANALYSIS
Regression analysis with two or more independent variables or with at least one nonlinear predictor.
PROBABILISTIC MULTIPLE REGRESSION MODEL
y = β0 + β1x1 + β2x2 + β3x3 + … + βkxk + ͼ
y = the value of the dependent (response variable) β0 = the regression constant β1 = the partial regression coefficient of independent variable β2 = the partial regression coefficient of independent variable 2. βk = the partial regression coefficient of independent variable k k = the number of independent values ͼ = the error of prediction
What is the dependent variable y sometimes referred to as?
The response variable
What does the partial regression coefficient of an independent variable, βi represent?
Represents the increase that will occur in the value of y from a one unit increase in that independent variable if all other variables are held constant.
ESTIMATED REGRESSION MODEL
y-hat = b0 + b1x1 + b2x2 + b3x3 + … + bkxk
where:
y-hat = predicted value of y
b0 = estimate of regression constant
b1 = estimate of regression coefficient 1
b2 = estimate of regression coefficient 2
b3 = estimate of regression coefficient 3
bk = estimate of regression coefficient k
k = number of independent variables
MULTIPLE REGRESSION MODEL WITH TWO INDEPENDENT VARIABLES (FIRST ORDER)
The simplest multiple regression model is one constructed with two independent variables, where the highest power of either variable is 1 (first order regression model).
In multiple regression analysis, the resulting model produces a response surface. POPULATION MODEL: y = β0 + β1x1 + β2x2 + ͼ ESTIMATED MODEL: y-hat = b0 + b1x1 +b2x2
RESPONSE PLANE FOR FIRST ORDER TWO PREDICTOR MULTIPLE REGRESSION MODEL
In multiple regression analysis, the resulting model produces a response surface.
In the multiple regression model shown here with two independent first order variables, the response surface is a response plane.
Fit into a 3D space (x1,x2,y)
DETERMINING THE MULTIPLE REGRESSION EQUATION
The simple regression equations for determining the sample slope and intercept given in earlier material are the result of using methods of calculus to minimise the sum of error for the regression model.
The formulas are established to meet an objective of minimising the sum of squares of error for the model.
The regression analysis shown here is referred to as least squares analysis. Methods of calculus are applied, resulting in k+1 equations with k+1 unknowns for multiple regression analyses with k independent variables.
LEAST SQUARES EQUATIONS FOR k = 2
b0n + b1Σx1 + b2 Σx2 = Σy
b0Σx1 + b1Σx1^2 + b2Σx1x2 = Σx1y
b0Σx2 + b1Σx1x2 + b2Σx2^2 = Σx2y
TESTING THE OVERALL MULTIPLE REGRESSION MODEL
H0 : β1 = β2 = β3 = … = βk = 0
Ha : At least one of the regression coefficients is =/ 0
SIGNIFICANCE TESTS FOR INDIVIDUAL REGRESSION COEFFICIENTS
H0 : β1 = 0 Ha : β1 =/ 0 H0 : β2 = 0 Ha : β2 =/ 0 H0 : β3 = 0 Ha : β3 =/ 0 H0 : βk = 0 Ha : βk =/ 0
What does it mean if you fail to reject the null hypothesis of a regression model?
Stating that the regression model has no significant predictability for the dependent variable.
A rejection of the null hypothesis indicates that at least one of the independent variables is adding significant predictability for y.
F-VALUE
MSR (MS - Regression) / MSE (MS - Residual)
How are residuals calculated for in multiple regression analysis?
First, a predicted value, y-hat, is determined by entering the value for each independent variable for a given set of observations into multiple regression equation and solving for y-hat.
What are residuals useful for?
Helpful in locating outliers.