Single Variable Linear Regression Flashcards

1
Q

We use regression analysis for two primary purposes:

A

Studying the magnitude and structure of the relationship between two variables.
Forecasting a variable based on its relationship with another variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The structure of the single variable linear regression line is

A

ŷ =a+bx

ŷ is the dependent variable
X is the independent variable (i.e. it is plotted on the X axis), the variable we are using to help us predict or better understand the dependent variable.
aa is the y-intercept, the point at which the regression line intersects the vertical axis.
bb is the slope, the average change in the dependent variable yy as the independent variable xx increases by one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a prediction interval?

A

an interval around the point forecast that is likely to contain, for example, the actual selling price of a house of a given size. The center of the prediction interval is the point forecast.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

R2 measures …

A

the percent of total variation in the dependent variable, yy , that is explained by the regression line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does R2 equal in single variable linear regression?

A

For a single variable linear regression, R2 is equal to the square of the correlation coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we test whether the relationship between the dependent and independent variable is significant and whether the linear model is a good fit for the data?

A

We do this by analyzing the p-value (or confidence interval) associated with the independent variable and the regression’s residual plot.

If the coefficient’s p-value is less than 0.05, we reject the null hypothesis and conclude that we have sufficient evidence to be 95% confident that there is a significant linear relationship between the dependent and independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is p-value?

A

The p-value of an independent variable is the result of the hypothesis test that tests whether there is a significant linear relationship; that is, it tests whether the slope of the regression line is zero,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

P-value vs R2?

A

p-value and R2 provide different information. A linear relationship can be significant (have a low p-value) but not explain a large percentage of the variation (not have a high R2.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does a low P-value show?

A

That a linear relationship can be significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does low R2 show?

A

a large percentage of the variation

Lower R2 (0.70): A smaller portion of the variation in yy is explained by the regression line than in the previous graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does confidence interval indicate?

A

A confidence interval associated with an independent variable’s coefficient indicates the likely range for that coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When do we use dummy variables?

A

To perform regression analyses using qualitative, or categorical, variables. To do so, we must convert data to dummy (0, 1) variables. After that, we can proceed as we would with any other regression analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do we use scatter plots for?

A

for visualizing a relationship between two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does correlation coefficient measure?

A

It is a value between -1 and 1 that measures the strength and direction (positive or negative) of the linear relationship between two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do we use to predict a value?

For example, we may want to predict the price of a house on the basis of its size. How much can we expect to pay for a 1,200 square foot home?

A

Point forecast;

y = A + BX

y - dependent value (e.g. selling price)

a - intercept (the value of y when x is 0)

B - slope (average change when it increases by 1)

X - independent variable (one we are using to help us predict dependent variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you forecast with EXCEL?

A

Another quick way to forecast is to use Excel’s FORECAST function:
=FORECAST(x, known_y’s, known_x’s)

17
Q

How do you calculate prediction interval in excel?

A

=T.INV.2T(probability, degrees_freedom)

probability is 1–confidence level, so for a 95% prediction interval, we would enter 0.05.

degrees_freedom is the number of degrees of freedom, in this case, n–2, where n is the sample size.

18
Q

What happens to prediction interval when confidence level increases?

A

the width of the prediction interval increases.

19
Q

Forecasting in Excel

A

=SUMPRODUCT(array1, [array2], [array3],…) i

20
Q

Residual Sum of Squares =

A

The Residual Sum of Squares is the amount of variation that is left unexplained by the regression line, that is, the sum of the squared differences between the predicted and observed values.

21
Q

Total Sum of Squares =

A

The Total Sum of Squares is the variance of yy, that is, the total variation in yy. The Total Sum of Squares equals the sum of the squared differences between the observed values of yy and the mean of yy. That is exactly what the graph shows.

22
Q

What does R2 measure?

A

R2 measures how closely a regression line fits a data set. it’s defined as a % of total variation in the dependent variable.

23
Q

Earlier in this module, we found that the correlation coefficient between house size and selling price is 0.86. What is the R2 of the best fit line that describes the relationship between selling price and house size?

A

0.74
Remember that for a single variable linear regression, R2 is the square of the correlation coefficient. Here, the correlation coefficient is 0.86, so R2=0.862=0.74.

24
Q

What are Quantitative (numerical) variables?

A

Variables that can be counted or measured and that are naturally represented as numbers.

25
Q

What are Qualitative (categorical) variables?

A

Variables that can be sorted or grouped into categories. Qualitative variables must be transformed into dummy variables

26
Q

The regression output table is divided into three main parts:

A

Regression Statistics table, the ANOVA table, and the Regression Coefficients table

27
Q

AS X INCREASES, Y INCREASES

What Slope is it?

A

Positive Slope

28
Q

AS X INCREASES, Y DOES NOT CHANGE.

What Slope is it?

A

Zero Slope

29
Q

AS X INCREASES, Y DECREASES.

What Slope is it?

A

Negative Slope

30
Q

Given the general regression equation, ŷ =a+bxy^=a+bx , which of the following describes ŷ y^ ?

A

The expected value of y - CORRECT
The dependent variable- CORRECT
The value we are trying to predict - CORRECT

31
Q

When analyzing a residual plot, which of the following indicates that a linear model is a good fit?

A

Random spread of residuals around the x-axis

A linear model is a good fit if the residuals are spread randomly above and below the x-axis.

32
Q

What indicates for a regression line’s slope indicates that the linear relationship is NOT significant at the 5% level?

A

If it contains zero. E.g. The range between -9.85 and 5.26 contains zero, which indicates that the linear relationship is not significant at the 5% level.