Regression Analysis Flashcards
What is regression analysis?
Regression analysis uses a mathematical model to predict a variable (y) based on values of other variables (x1, x2, … xk). It is the process of finding the mathematical model that relates y to a set of independent variables and best fits the data.
What is a dependent variable?
A dependent variable (y), aka response variable, is the variable to be modelled and/or predicted.
What is an independent variable?
An independent variable (x1, x2, … xk) are variables that are used to predict the response variable.
What is the equation for a probabilistic model?
y = E(y) + ε
Where E(y) = mean of y (i.e. the expected value of y)
Where ε = some random error
A probabilistic model is based on the theory of probability or the fact that _______________ plays a role in predicting future events.
randomness
What is the difference between a probabilistic model and a deterministic model?
A probabilistic model is based on the fact that randomness plays a role in predicting future events.
A deterministic model is the opposite of random - it tells us something can be predicted exactly, without the added complication of randomness.
There are 7 major steps in regression analysis.
What are they?
(Hint: H, C, U, E, U, V, I)
- Hypothesize the form of the model for the E(y) - expected value of y.
- Collect the sample data.
- Estimate the unknown parameters in the model using the sample data.
- Specify the probability distribution of ε (random error) and estimate any unknown parameters.
- Statistically check model adequacy.
- Check validity of the assumptions on the ransom error; Make modifications if necessary.
- Use the model for prediction and estimation.
There are 6 steps in regression for a probabilistic model [y = E(y) + ε].
What are they?
(Hint: H, C, E, P, C, P)
- Hypothesize the form of the model for the E(y) - expected value of y.
- Collect the sample data.
- Estimate the unknown parameters in the model.
- Specify the probability distribution of ε.
- Statistically check model adequacy.
- Use the model for prediction and estimation.
There are two types of regression data.
What are they?
Observational: where values of x are uncontrolled.
Experimental: where values of x are controlled via a designed experiment.
What is the difference between simple linear regression and multiple regression?
Simple Linear Regression involves a single independent variable.
Multiple Regression involves two or more independent variables.
How does this equation need to be modified to be considered a prediction equation?
E(y)=β0+β1x1+β2x2+β3x1x2+β4x12+β5x22
We would need to update the entire equation to be a prediction equation for ŷ.
ŷ=B̂0+B̂1x1+B̂2x2+B̂3x1x2+B̂4x12+B̂5x22
where ŷ is the predicted value of y.
What is missing from the equation if we are supposed to use it for probabilistic model?
E(y)=β0+β1x1+β2x2+β3x1x2+β4x12+β5x22
Within a probabilistic model, we would need to make sure to incorporate the +ε factor. The equation would be updated to read as follows:
E(y)=β0+β1x1+β2x2+β3x1x2+β4x12+β5x22+ε
Within the following mathematical equation for the deterministic model pictured, what do β0,β1,β2,β3,β4,β5 represent?
E(y)=β0+β1x1+β2x2+β3x1x2+β4x12+β5x22
β0,β1,β2,β3,β4,β5 are constants with values that would have to be estimated from the sample data.
Within the following mathematical equation for the deterministic model pictured, what doe the E(y) represent?
E(y)=β0+β1x1+β2x2+β3x1x2+β4x12+β5x22
E(y) repesents the mean percentage price increase for a set of values (x1 and x2).
What is the purpose of collecting sample data for regression analysis?
The purpose of collecting sample data is to estimate the unknown parameters of a regression model, (The β’s).