Ch 7 Notes Flashcards
7.1 Linear Regression Model
Captures relationship between two or more variables
Predicts an outcome of a target variable based on several input variables
Make assessments and robust predictions by determining which of the relationships matter the most and which we can ignore
Regression Analysis
7.1 Linear Regression Model
Formulate a mathematical model that relates the outcome of a target variable, called the response variable, to one or more other input values called predictor values
Use information on the predictor variables to describe and/ or predict changes in the response variable
Predictor Variables
7.1 Linear Regression Model
Known to perform well for making predictions
Fail to establish a cause-and-effect relationship between the variables because of the nonexperimental nature of most applications
May appear to search for casuality when it basically detects correlation
Casuality: can only be established through randomized experiments and/ or advanced statistical models
Regression Model
7.1 Linear Regression Model
Cannot expect to predict its exact (unique) value
If value is uniquely determined by the values of the predictor variables, the relationship between the variables is deterministic
Relationship between this variable and predictor variable is stochastic, due to the omission of relevant factors that influence this variable
Response Variable
7.1 Linear Regression Model
How to develop a mathematical model that captures the relationship between the response variable π¦ and the π predictor variables π₯_1,π₯_2,β―,π₯_π.
*Must account for randomness that is a part of real life
- Start with a deterministic component that approximates the relationship
- Then add a random error term to it, making the relationship stochastic
- Use economic theory, intuition, and statistical measures to determine which predictor variables might best explain the response variable
7.1 Linear Regression Model
Assumes meaningful numeric values
Numerical Value
7.1 Linear Regression Model
Reflect categories
Categorical Variable
7.1 Linear Regression Model
Used to describe two categories of a categorical variable, denoted d
Indicator or binary variable
π=1 for one of the categories
π=0 for the other(s)
The category with π=0 is called the reference or benchmark category
All comparisons are made relative to this category
Dummy Variable
7.1 Linear Regression Model
**Uses only 1 predictor value
π¦=π½_0+π½_1 π₯_1+π
π½_0 is the unknown intercept
π½_1 is the unknown slope
π½_0+π½_1 π₯_1 is the deterministic component
π is the stochastic component or random error term
**The expected value of π¦ for a given value of π₯_1 lies on a straight line: πΈ(π¦)=π½_0+π½_1 π₯_1.
Simple Linear Regression Model
7.1 Linear Regression Model
π½_1 > 0
Positive Linear Relationship
7.1 Linear Regression Model
π½_1< 0
Negative Linear Relationship
7.1 Linear Regression Model
π½_1 = 0
No linear Relatioship
7.1 Linear Regression Model
Model with more than one predictor
π¦=π½_0+π½_1 π₯_1+π½_2 π₯_2+β¦+π½_π π₯_π+π
Replace π₯ with π for a dummy variable.
The population parameters π½_0,π½_1,π½_2,β¦,π½_π are unknown and must be estimated using sample data.
Sample data: π observations of π¦,π₯_1,π₯_2,β¦,π₯_π
Use the sample data to obtain π_0,π_1,π_2,β¦π_π which are estimates of π½_0,π½_1,π½_2,β¦,π½_π.
Multiple Linear Regression Model
7.1 Linear Regression Model
Predicted value of the response variable given specified values of predictor varibles
π¦Μ (read as y-hat)
7.1 Linear Regression Model
π=π¦βπ¦Μ
Difference between observed and predicted values
Residual
7.1 Linear Regression Model
πππΈ=β(π¦βπ¦Μ )^2 = βπ^2
Chooses the sample regression equation by minimizing the error sum of squares
Desirable properties is certain assumptions hold
Gives an equation βclosestβ to the data
Ordinary Least Squares (OLS)
7.1 Linear Regression Model
π_0 is the estimate of π½_0
π_π is the estimate of π½_π
Estimated regression coefficients
7.1 Linear Regression Model
Predicted value of π¦Μ when each predictor variable assumes a value of 0
Not always meaningful
π_0 is the estimate of π½_0
7.1 Linear Regression Model
Change in the predicted value of the response given a unit increase in π₯_π, holding all other predictor variables constant
Partial influence of π₯_π on π¦Μ
π_π is the estimate of π½_π
7.1 Linear Regression Model
Subject to sampling variability
Will change if we use a different sample to estimate the regression model
Always wider than the corrisponding confidence interval
Predictions
7.1 Linear Regression Model
point estimate Β± margin of error
Use a confidence interval as the interval estimate for the mean (expected value) of y
Use a prediction interval as the interval estimate for the individual values of y
We use the same point estimate for constructing both
Interval Estimate
7.1 Linear Regression Model
π¦Μ^0Β±π‘_(πΌβ2,ππ) π π(π¦Μ^0)
π¦Μ^0=π_0+π_1 π₯_1^0+ π_2 π₯_2^0+ β― + π_π π₯_π^0
ππ=πβπβ1
One way to obtain this model is to estimate a modified regression model
y is the response variable
Explanatory variables defined as π₯_π^β=π₯_πβπ₯_π^0
The resulting estimate of the intercept and its standard error are π¦Μ^0 and π π(π¦Μ^0)
For specific values π₯_1^0,π₯_2^0,β―,π₯_π^0, the 100(1βπΌ)% confidence interval for the Expected value of y
7.1 Linear Regression Model
π¦Μ^0Β±π‘_(πΌβ2,ππ) β(γ(π π(π¦Μ^0))γ^2+π _π^2 )
π¦Μ^0=π_0+π_1 π₯_1^0+ π_2 π₯_2^0+ β― + π_π π₯_π^0
ππ=πβπβ1
Prediction Interval is wider than the confidence interval
Prediction Interval Incorporates the variability of the random error term
Higher variability makes it more difficult to predict accurrately, necessitating a wider interval
For specific values π₯_1^0,π₯_2^0,β―,π₯_π^0, the 100(1βπΌ)% prediction interval for the Individual Value of y