Single Variable Linear Regression Flashcards
We use regression analysis for two primary purposes:
Studying the magnitude and structure of the relationship between two variables.
Forecasting a variable based on its relationship with another variable.
The structure of the single variable linear regression line is
ŷ =a+bx
ŷ is the dependent variable
X is the independent variable (i.e. it is plotted on the X axis), the variable we are using to help us predict or better understand the dependent variable.
aa is the y-intercept, the point at which the regression line intersects the vertical axis.
bb is the slope, the average change in the dependent variable yy as the independent variable xx increases by one.
What is a prediction interval?
an interval around the point forecast that is likely to contain, for example, the actual selling price of a house of a given size. The center of the prediction interval is the point forecast.
R2 measures …
the percent of total variation in the dependent variable, yy , that is explained by the regression line.
What does R2 equal in single variable linear regression?
For a single variable linear regression, R2 is equal to the square of the correlation coefficient.
How do we test whether the relationship between the dependent and independent variable is significant and whether the linear model is a good fit for the data?
We do this by analyzing the p-value (or confidence interval) associated with the independent variable and the regression’s residual plot.
If the coefficient’s p-value is less than 0.05, we reject the null hypothesis and conclude that we have sufficient evidence to be 95% confident that there is a significant linear relationship between the dependent and independent variables.
What is p-value?
The p-value of an independent variable is the result of the hypothesis test that tests whether there is a significant linear relationship; that is, it tests whether the slope of the regression line is zero,
P-value vs R2?
p-value and R2 provide different information. A linear relationship can be significant (have a low p-value) but not explain a large percentage of the variation (not have a high R2.)
What does a low P-value show?
That a linear relationship can be significant
What does low R2 show?
a large percentage of the variation
Lower R2 (0.70): A smaller portion of the variation in yy is explained by the regression line than in the previous graph.
What does confidence interval indicate?
A confidence interval associated with an independent variable’s coefficient indicates the likely range for that coefficient.
When do we use dummy variables?
To perform regression analyses using qualitative, or categorical, variables. To do so, we must convert data to dummy (0, 1) variables. After that, we can proceed as we would with any other regression analysis.
What do we use scatter plots for?
for visualizing a relationship between two variables.
What does correlation coefficient measure?
It is a value between -1 and 1 that measures the strength and direction (positive or negative) of the linear relationship between two variables.
What do we use to predict a value?
For example, we may want to predict the price of a house on the basis of its size. How much can we expect to pay for a 1,200 square foot home?
Point forecast;
y = A + BX
y - dependent value (e.g. selling price)
a - intercept (the value of y when x is 0)
B - slope (average change when it increases by 1)
X - independent variable (one we are using to help us predict dependent variable)