Module 4 Flashcards
What are the two primary purposes for using regression?
- Studying the magnitude and structure of a relationship between two variables.
- Forecasting a variable based on its relationship with another variable.
What is the structure of the single variable linear regression line?
y^=a+bx.
What is the expected value of y, the dependent variable, for a given value of x.?
y^
What is the independent variable, the variable we are using to help us predict or better understand the dependent
variable.?
x
What is the y-intercept, the point at which the regression line intersects the vertical axis. This is the value of y^
when the independent variable, x, is set equal to 0.?
a
What is the slope, the average change in the dependent variable y as the independent variable x increases by
one?
b
The true relationship between two variables is described by the equation y=α+βx+E, where E is the ______ y–y^.
error term
The idealized equation that describes the _________ is y^=α+βx.
true regression line
We determine a _________ by entering the desired value of x into the regression equation.
point forecast
We must be extremely cautious about using regression to forecast for values outside of the historically observed range of the _________ variable (x-values).
independent
Instead of predicting a single point, we can construct a __________, an interval around the point forecast that is likely to contain, for example, the actual selling price of a house of a given size.
prediction interval
The ______ of a prediction interval varies based on the standard deviation of the regression (the standard error of the regression), the desired level of confidence, and the location of the x-value of interest in relation to the historical values of the independent variable.
width
It is important to evaluate several metrics in order to determine whether a __________________ model is a good fit for a data set, rather than looking at single metrics in isolation.
single variable linear regression
_______ measures the percent of total variation in the dependent variable, y, that is explained by the regression line.
R2
R2 = Regression Sum of Squares/____________
Total Sum of Squares
___≤R2≤____
0; 1
For a single variable linear regression, R2 is equal to the _______ of the correlation coefficient.
square
In addition to analyzing R2, we must test whether the relationship between the dependent and independent variable is significant and whether the linear model is a good fit for the data. We do this by analyzing the _______ (or confidence interval) associated with the independent variable and the regression’s ________.
p-value; residual plot
The _______ of the independent variable is the result of the hypothesis test that tests whether there is a significant _______ relationship; that is, it tests whether the slope of the regression line is zero, H0: β=0 and Ha:β≠0.
p-value; linear
If the coefficient’s p-value is less than _____, we reject the null hypothesis and conclude that we have sufficient evidence to be ______ confident that there is a significant linear relationship between the dependent and independent variables.
0.05; 95%
Note that the p-value and R2 provide different information. A _________ can be significant (have a low p-value) but not explain a large percentage of the variation (not have a high R2.)
linear relationship
A ___________ associated with an independent variable’s coefficient indicates the likely range for that coefficient.
confidence interval
If the 95% confidence interval does not contain _____, we can be _____ confident that there is a significant linear relationship between the variables.
zero, 95%
__________ can provide important insights into whether a linear model is a good fit.
Residual plots
Each observation in a data set has a residual equal to the historically observed value _____ the regression’s predicted value, that is, ______.
minus; E=y-y^
Linear regression models assume that the regression’s residuals follow a normal distribution with a mean of
_____ and _____ variance.
zero; fixed
We can also perform regression analyses using qualitative, or categorical, variables. To do so, we must convert
data to ____________.
dummy (0, 1) variables
A dummy variable is equal to ____ when the variable of interest fits a certain criterion.
- For example, a dummy variable for “Female” would equal 1 for all female observations and 0 for male observations.
Where do you go to add the best fit line to a scatter plot in excel?
Using the Insert menu
What is a convenient function for calculating point forecasts?
=SUMPRODUCT(array1, [array2], [array3],…)
Where do you go to create a regression output table in excel?
regression output table using the Data Analysis tool
How do you create regression models with dummy variables in excel?
=IF(logical_test,[value_if_true],[value_if_false])
→ Returns value_if_true if the specified condition is met, and returns value_if_false if the condition is not
met.