HBX- BA - 4 Flashcards

1
Q

Regression Analysis

A

One of the most powerful and commonly used statistical tools. Has two primary purposes:

  • Used to identify a mathematical relationship between a dependent variable and one or more independent variables.
  • Regression analysis can be used to forecast the behavior of the dependent variable and/or to better understand the nature of the relationship between the dependent and the independent variable(s).

What is the regression line??

  • the regression line is the line that minimizes the dispersion of points around that line, and we measure the accuracy of the regression line by measuring that dispersion.
  • (on the graph below) We attribute the difference between the actual data points and the values predicted by the regression line either to relationships between selling price and variables other than house size or to chance alone.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linear Regression

A

A specific form of regression analysis that examines the linear relationship between a dependent variable and one or more independent variables. Linear regression analysis identifies the “best fit line,” the line that minimizes the sum of squared error terms between the observed values in the sample and the predicted values that lie on the regression line. This best-fit line is called the regression line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Single variable linear regression

A

Single variable linear regression can be seen as an extension of hypothesis testing. We have learned to use hypothesis tests to determine whether or not there is a significant relationship between two variables. Single variable linear regression seeks to identify a linear relationship between two variables

A single variable regression line can be described by the equation:

ŷ = a + bx

  • ŷ is the expected value of y, the dependent variable, for a given value of x.
  • a is the y-intercept of the line the point at which the regression line intersects the vertical axis. This is the value of ŷ when the independent variable, x, is set equal to 0.
  • b is the slope, the average change in the dependent variable y as the independent variable x increases by one.
  • x is the independent variable, the variable we are using to help us predict or better understand the dependent variable.

Single Variable Linear Regression analysis is used to identify the best fit line between two variables. This analysis builds on two previous concepts we have used to study relationships between two variables:

  • Scatter plots, which are useful for visualizing a relationship between two variables.
  • The correlation coefficient, a value between -1 and 1 that measures the strength and direction (positive or negative) of the linear relationship between two variables.

As we learned in earlier in the course, we typically use Greek letters (like σ) to refer to the “true” parameters associated with a population and Latin letters (like s) to refer to the estimates of those parameters we calculate from sample data. Similarly, we refer to the best fit line we obtain from our sample data as ŷ =a+bx to distinguish it from ŷ =α+βx, the idealized equation that represents the “true” best fit line.

Because the best fit line does not perfectly fit even the population data, we add an error term, ε, to the true equation: y=α+βx+ε.
The error term is the difference between the actual value of y and the expected value of y. That is, ε=y−ŷ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

REVIEW- How to:

  • Make a Scatter Plot in Excel
  • Find the Correlation Coefficient in Excel
A

Create a Scatter Plot in Excel

  • Insert -> Scatter -> Scatter With Only Markers
  • Input Y range (Ex: C1:C11)
  • Input X range (Ex: B1:B11)
  • Check the “Labels in First Box”

correlation coefficient
A measure of the strength of a linear relationship between two variables. The correlation coefficient can range from -1 to +1. A correlation coefficient of -1 indicates a perfect negative linear relationship between two variables, whereas a correlation coefficient of +1 indicates a perfect positive linear relationship. A correlation coefficient of 0 indicates that no linear relationship exists between two variables, though it is possible that a non-linear relationship exists between the two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
A

250

Pick two points on the x-axis—let’s say 1,000 and 2,000—and see what the corresponding points are on the y-axis. According to the regression line, the expected selling price of a 1,000 square foot house is approximately $250,000, and for a 2,000 square foot house is approximately $500,000. Therefore, as house size increases by 1,000 square feet, price increases, on average, by approximately $250,000. To find the average change in price as house size increases by one square foot, we divide $250,000 by 1,000. We find that as house size increases by one square foot, price increases, on average, by approximately $250.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
A

Option D

Since there is no obvious linear relationship between the variables, a line that is almost horizontal is most accurate. The line is positioned close to the average y-value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to Add the Best Fit Line to a Scatter Plot

A

Create a Scatter Plot in Excel

  • Insert -> Scatter -> Scatter With Only Markers
    • Input Y range (Ex: C1:C11)
    • Input X range (Ex: B1:B11)
    • Check the “Labels in First Box”
  • Insert -> Chart Tools ->Layout ->Trendline
    • Check the Display Equation box to display the equation of the best fit line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Given the regression equation,
Selling Price= 13,490.45 + 255.36(HouseSize),
which of the following values represents the average change in selling price as house size increases by one square foot?

A

255.36

255.36 dollars/square foot is the line’s slope, which is equal to the average change in selling price as house size increases by one square foot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Given the regression equation, Selling Price = 13,490.45 + 255.36(HouseSize),
what value represents the value of HouseSizeHouseSize at which the regression line intersects the horizontal axis?

A

- 52.83 square feet

The regression line intersects the horizontal axis when Selling Price = $0, that is, when House Size = -52.83 square feet. 13,490.45+ 255.36*(-52.83)=$0.00 (actually, -52.82914, which rounds to -52.83).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Given the regression equation, SellingPrice=13,490.45+255.36(HouseSize), which of the following values represents the value of SellingPriceSellingPrice at which the regression line intersects the vertical axis?

A

$13,490.45

13,490.45 is the y-intercept, the value at which the regression line intersects the y-axis. This happens when House Size = 0, giving the equation: Selling Price = 13,490.45+255.36*0 = 13,490.45

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Given the general regression equation, ŷ =a+bx, which of the following describes ŷ? Select all that apply.

  • The expected value of y
  • The expected value of x
  • The independent variable
  • The dependent variable
  • The value we are trying to predict
  • The intercept
A
  • The expected value of y
  • The dependent variable
  • The value we are trying to predict
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to Forcast in Excel

A

Using the equation ŷ =a+bx

  • Plug in the numbers! (See photo below)

Use Excel’s FORECAST function: (we didn’t use this in the module, but it was mentioned)

=FORECAST(x, known_y’s, known_x’s)

  • x is the data point for which you want to predict a value.
  • known_y’s is the dependent array or range of data.
  • known_x’s is the independent array or range of data.
  • In order to use this function we must have the original data. This approach also gives us a point forecast, but does not provide other helpful values that Excel’s regression tool produces.

WHATEVER THE CASE, MAKE SURE THE PREDICTION IS WITHIN THE RANGE OF HISTORICAL DATA OR IT IS NOT A GOOD FORECAST.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Prediction Interval

A

Rather than predicting just a single point, we construct an interval, or range, around the point forecast.

A prediction interval is a range of values constructed around a point forecast. The center of the prediction interval is the point forecast, that is, the expected value of y for a specified value of x. The range of the interval extends above and below the point forecast. The width of the interval is based on the standard error of the regression and the desired level of confidence in the prediction.

  • The center of the prediction interval is the point forecast– in the case below, about $525,000. The standard error of the regression– in this case, about $151,000– is a reasonable but conservative estimate of the forecast’s standard deviation. The standard error of the regression is easily found in a regression output table.
  • we have to choose a level of confidence for our prediction interval. A 95% prediction interval would run about two standard deviations above and below the point forecast. To forecast the price of a 2,000-square-foot home, the 95% prediction interval would be about $525,000 plus or minus two times $151,000.
  • With this, we are able to say that we are 95% confident that the actual selling price will fall within the prediction interval.​
  • Since there is greater uncertainty when we forecast further from the mean of the independent variable, we can infer that the prediction interval should be wider as we move away from the average house size. So although the standard error is a reasonable estimate on which to base our range, the actual calculation is more complicated. As we move towards and then beyond the edges of the historical data, the width of the distribution around the point forecast increases. In this case, a 95% prediction interval for the selling price of a 7,000-square-foot home would be much wider than that for a 2,000-square-foot home.
    (4. 3.2)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The best point forecast for the selling price of a 2,500 square foot house is the expected selling price of a 2,500 square foot home, approximately 13,490 + 255.36(2,500) = $652,000. Given that the standard error of the regression is about $151,000, which of the following would give the BEST estimate for the prediction interval for a 2,500 square foot home with approximately 95% confidence?

A

$652,000 ± 2($151,000)

A prediction interval is centered at a point forecast, in this case $652,000. The standard error of the regression is multiplied by 2 since we wish to estimate the prediction interval at the 95% confidence level. Note that we are using 2 to approximate the z-value for a 95% prediction interval. The actual z-value corresponding to 95% (for sufficiently large samples) is 1.96.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The standard error of the regression is….

A

is a reasonable but conservative estimate of the forecast’s standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How would the width of the actual prediction interval (at a 95% confidence level) for a 3,000 square foot home differ from the width of the actual prediction interval (at a 95% confidence level) for a 2,000 square foot home, given that the average home size is approximately 1,750 square feet?

A

The width of the actual prediction interval for a 3,000 square foot home would be larger than the width of the prediction interval for a 2,000 square foot home.

Because 3,000 square feet is further from the mean house size (1,750 square feet) than 2,000 square feet, the actual prediction interval at 3,000 square feet will be wider.

The width of the actual prediction interval is based on both the standard error of the regression and the distance from the mean; the actual prediction interval gets wider as the value of the independent variable moves further from the mean of the independent variable.

The image below compares prediction intervals created using both of the methods we have discussed. The red dashed lines show the actual prediction intervals for different house sizes. The blue dashed lines represent our method of estimating the prediction interval using the standard error of the regression. Note that the actual prediction intervals widen as house size moves further from the mean whereas the estimate prediction intervals, do not. They are parallel to the regression line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
A

Yes

3,500 lies well within the range of our historical housing data, so we can feel relatively comfortable with this prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Given the regression equation, SellingPrice = 13,490.45 + 255.36(HouseSize), what do you expect the selling price of a 425 square foot home to be?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the vertical distance between a data point and the line?

A

The Residual Error.
This error is the difference between the observed value and the line’s prediction for the dependent variable. This difference may be due to other factors that influence selling price or just a plain chance. Collectively, the residuals for all the data points measure how accurately a line fits a data set.

Variation unexplained by the regression line!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The Sum of Squared Errors, or the Residual Sum of Squares

A

The amount of variation that is not explained by the regression line. The residual sum of squares is equal to the sum of the squared residuals, that is, the sum of the squared differences between the observed values of the dependent variable and the predicted values of the dependent variable. To calculate the residual sum of squares, subtract the regression sum of squares from the total sum of squares.

For this, we take the square of each distance and then add all of those squared terms together.

A regression line is formally defined as the line that minimizes the sum of squared errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Total Sum of Squares

A

This is basically the sum of squared errors for the mean price line (the straight line in the graph below)

Having this mean price lines gives us a benchmark telling us how much more we know about our study if we have data about than if we do not.

Formal Definition: The variance of the dependent variable, that is, the sum of squared differences between the observed values of the dependent variable and the mean of the dependent variable. The total sum of squares is equal to the regression sum of squares plus the residual sum of squares.

22
Q

Regression Sum of Squares

A

regression sum of squares

The amount of variation that is explained by the regression line. To calculate the regression sum of squares, subtract the residual sum of squares from the total sum of squares.

23
Q
A

Residual Sum of Squares

The Residual Sum of Squares is the amount of variation that is left unexplained by the regression line, that is, the sum of the squared differences between the predicted and observed values. That is exactly what this graph shows.

24
Q
A

Total Sum of Squares

The Total Sum of Squares is the variance of y, that is, the total variation in y. The Total Sum of Squares equals the sum of the squared differences between the observed values of y and the mean of y. That is exactly what the graph shows.

25
Q
A

R-squared) measures how closely a regression line fits a data set. It is a standardized measure of the regression line’s explanatory power. It is defined as the percentage of total variation in the dependent variable, y, that is explained by the regression line.

In single variable linear regression, i.e. a regression model that has only one independent variable, is the square of the correlation coefficient between the independent and dependent variables.

R2 = (Correlation Coefficient)2

Equivalently, the correlation coefficient is the positive or negative square root of R2. The sign is determined by whether there is a positive or negative relationship between the two variables.

Correlation Coefficient (R) = ±√R2

EXCEL: Adding R-squared to a Scatter Plot

  • We can quickly find the for a single variable linear regression by creating a scatter plot of the two variables.
  • When we select Trendline and check the Display Equation box to display the equation of the best fit line, we can also check the Display R-squared Value box to display the value on the scatter plot.

R-squared can only take on values between 0 and 1. Let’s look at those two extremes.

  • When R-squared equals 0, the regression line explains none of the variation in the dependent variable.
  • When R-squared equals 1, the line explains all of the variation in the dependent variable. The regression line fits the data perfectly.

As always, it’s critical that we consider the problem we’re trying to solve and its context before performing or evaluating a regression analysis.

  • Although we may wish for an R-squared close to one, there are many contexts for which this is not realistic.*
  • LOW R-squared values are expected and accepted sometimes! KNOW YOUR FIELD/QUESTION*
  • In fields such as human behavior, lower R-squared values are both expected and accepted because human behavior is difficult to predict. Suppose we wish to know how well a person’s tenth grade reading speed predicts his or her lifetime salary. There may be a relationship between the two variables, but we would not expect reading speed to be a very good predictor of salary. An R-squared of 0.1 might be considered high in this case.*
26
Q

Another Way to Calculate R2

A
27
Q

Earlier in this module, we found that the correlation coefficient between house size and selling price is 0.86. What is the R2 of the best fit line that describes the relationship between selling price and house size?

A

0.74

Remember that for a single variable linear regression, R2 is the square of the correlation coefficient. Here, the correlation coefficient is 0.86, so R2=0.862=0.74.

28
Q
A

0.80

The independent variable explains a lot of the variation in the dependent variable, but not quite all of it. In total, the data points are close to the best fit line, but they do not lie on it. Thus, an R2 of 0.80 seems like a good estimate.

29
Q

the p-value

A

the likelihood that we would select a sample at least as extreme as the one we observed if the null hypothesis were true

30
Q

What are two ways to test whether the slope of the best fit line equals zero?

And why would we need to do this?

A

We need to do this because: if the 95% confidence interval for the slope does not include zero, we can be 95% confident that the true value of the slope is not zero and thus that a significant relationship exists between the variables.

  1. Check whether the confidence interval for the line’s slope contains zero
    Remember that the coefficients of the regression line are just estimates of the true linear relationship between the dependent and independent variables. A coefficient’s lower 95% and upper 95% values give us the lower and upper bounds of the 95% confidence interval for that coefficient. Recall that if the best fit regression line has a slope of zero, then the regression line is just a flat line equal to the mean of the dependent variable, indicating that that there is no linear relationship between the two variables. Thus, if the 95% confidence interval for the slope does not include zero, we can be 95% confident that the true value of the slope is not zero and thus that a significant relationship exists between the variables.
    1. In the photo below, we can say we are 95% confident that the true slope of the regression line describing the relationship between selling price and house size is between 196.10 and 314.63. Because this range does not include the value zero, we can be 95% confident that there is a significant linear relationship between the variables.
  2. Check whether the p-value is greater than or equal to 0.05
    As we noted earlier, regression analysis builds on hypothesis testing. In fact, a single variable linear regression analysis is equivalent to the hypothesis test,
    Recall that the p-value for a hypothesis test is the likelihood that we would select a sample at least as extreme as the one we observed if the null hypothesis were true. The p-value associated with a regression coefficient is the likelihood of choosing a sample at least as extreme as the sample we used to derive the regression equation if the slope of the true regression line is actually zero, or equivalently, if there is no linear relationship between the two variables.
    In the photo below, Since the p-value for house size, 0.0000, is less than 0.05, we reject the null hypothesis that the slope is zero and can be confident that there is a significant linear relationship between selling price and house size. (We can ignore the p-value of the intercept coefficient because the y-intercept is just a constant. It does not represent an independent variable and thus provides no information about the significance of the relationship between two variables.)
    Recall that a significance level of 5% corresponds to a confidence level of 95%, so checking whether a regression coefficient’s p-value is less than 5% is equivalent to checking whether the coefficient’s 95% confidence interval contains zero. Both approaches test whether or not we can be 95% confident that that there is significant linear relationship between the variables.
31
Q

Which of the following 95% confidence intervals for a regression line’s slope indicates that the linear relationship is not significant at the 5% level? Select all that apply.

  • -11.89; -2.17
  • 25.11; 44.37
  • -20.00; 5.00
  • -0.36; 0.55
A
  • -11.89; -2.17
    Remember that the 95% confidence interval of the slope must contain zero to indicate that the linear relationship is not significant at the 5% level. -11.89 and -2.17 are both negative, so this range does not contain zero.
  • 25.11; 44.37
    Remember that the 95% confidence interval of the slope must contain zero to indicate that the linear relationship is not significant at the 5% level. 25.11 and 44.37 are both positive, so this range does not contain zero.
  • -20.00; 5.00
    The range between -20.00 and 5.00 contains zero, which indicates that the linear relationship is not significant at the 5% level. Note that another option is also correct.
  • -0.36; 0.55
    The range between -0.36 and 0.55 contains zero, which indicates that the linear relationship is not significant at the 5% level. Note that another option is also correct.
32
Q
A

Yes

Since the p-value of the independent variable, 0.0000, is less than 0.05, we can be 95% confident that there is a significant linear relationship between gross box office and home video units. We could also note that (19.58; 22.95), the 95% confidence interval for the slope, does not contain zero.

33
Q
A

Close to 1

The data points are very close to the line, so our regression line must explain a very large proportion of the variation in yy. This would indicate a very high R2.

34
Q
A

Less than 0.05

A p-value less than 0.05 indicates that we can be 95% confident that the true slope is not zero, that is, that there is a significant linear relationship between the two variables. This graph provides strong evidence that there is a significant linear relationship between the two variables.

35
Q
A

Smaller

A smaller R2 means that less variation in the dependent variable yy is explained by the regression line. Compared with the previous graph, the data points here are more dispersed around the regression line, indicating that less of the variation is explained by the regression line.

36
Q
A

Less than 0.05

Even though the regression line has a smaller R2 than the previous regression, there is still clearly a strong linear relationship between the variables. A p-value less than 0.05 indicates that we can be 95% confident that the true slope is not zero, i.e., that there is a significant linear relationship.

37
Q

What would you say about the p-values and R2 values of the graphs below.

A

The 1st graph has:

  • High R2 (0.99): A large portion of the variation in y is explained by the regression line.
  • Low p-value (0.0000): There is a significant linear relationship between the dependent and independent variables.

The 2nd graph has:

  • Lower R2 (0.70): A smaller portion of the variation in y is explained by the regression line than in the previous graph.
  • Low p-value (0.0000): There is a significant linear relationship between the dependent and independent variables, even though the R2 is lower than in the previous graph.
38
Q

Residual Plots & How to create them

A

The residual plot is a scatter plot with residuals on the y-axis and the independent variable on the x-axis. The plot graphically represents the residual (the difference between the observed value and predicted value of the dependent variable) for each observation. Examining residual plots can provide significant insight into the relationships among variables and the validity of the assumptions underlying regression models.

How to Create One:

  • We first take each observed data point and measure its residual– the vertical distance from that point to the regression line.
  • Then we graph each residual against the independent variable to form the residual plot.
    • If there is a linear relationship between the dependent and independent variables and the assumptions underlying regression analysis hold, we should not see any systematic pattern in the residual plot. The residual should be spread randomly above and below the horizontal axis. Specifically, based on the assumptions underlying linear regression, the distribution of the residual should follow a normal distribution with mean zero and a fixed variance.
    • If we do see a pattern in the residual plot, then it’s possible that other factors may be influencing the dependent variable or that the linear model may not be the best fit for the data. For example, if the residuals appear to have a curved shape, there may be a nonlinear relationship between the dependent and independent variables.
    • If the residuals become larger as we move along the x-axis, we may be encountering a phenomenon known as heteroscedasticity. In heteroscedastic relationships, the variance changes systematically as the independent variable changes. This violates the assumption that the error terms follow a normal distribution with fixed variance.
39
Q
A
40
Q

Heteroskedasticity

A

A characteristic of the distribution of the residuals (error terms) in a regression. The error terms are heteroskedastic if the size of the error terms depends systematically upon the value(s) of the independent variable(s). Examining residual plots for patterns is useful for identifying heteroskedasticity (for example, if the error terms grow larger as the value of the independent variable grows larger, a classic funnel shape may be visible in the residual plot). Inferences drawn from a regression analysis with heteroskedastic error terms are suspect.

41
Q

What several metrics are important to evaluate in order to determine whether a single variable linear regression model is a good fit for a data set??

And why do we need to do this?

A
  • R2 - measures the percent of total variation in the dependent variable, y, that is explained by the regression line.
  • Analyze the p-value - we must test whether the relationship between the dependent and independent variable is significant and whether the linear model is a good fit for the data.
    • Note that the p-value and R2 provide different information. A linear relationship can be significant (have a low p-value) but not explain a large percentage of the variation (not have a high .)
  • Check the confidence intervals associated with an independent variable’s coefficient indicates the likely range for that coefficient.
    If the 95% confidence interval does not contain zero, we can be 95% confident that there is a significant linear relationship between the variables.
  • Residual plots can provide insights into whether a linear model is a good fit.
42
Q

A p-value to test the significance of a linear relationship between two variables was calculated to be 0.0210. What can we conclude? Select all that apply.

  • We can be 90% confident that there is a significant linear relationship between the two variables.
  • We can be 95% confident that there is a significant linear relationship between the two variables.
  • We can be 98% confident that there is a significant linear relationship between the two variables.
  • We can be 99% confident that there is a significant linear relationship between the two variables.
A
  • We can be 90% confident that there is a significant linear relationship between the two variables.
    • Since the p-value, 0.0210, is less than 1-0.90=0.10, we can be 90% confident that there is a significant linear relationship between the two variables. Note another option is also correct.
  • We can be 95% confident that there is a significant linear relationship between the two variables.
    • Since the p-value, 0.0210, is less than 1-0.95=0.05, we can be 95% confident that there is a significant linear relationship between the two variables. Note another option is also correct.
  • We can be 98% confident that there is a significant linear relationship between the two variables.
    • Since the p-value, 0.0210, is greater than 1-0.98=0.02, we cannot be 98% confident that there is a significant linear relationship between the two variables.
  • We can be 99% confident that there is a significant linear relationship between the two variables.
    • Since the p-value, 0.0210, is greater than 1-0.99=0.01, we cannot be 99% confident that there is a significant linear relationship between the two variables.
43
Q

The linear relationship between two variables can be statistically significant but not explain a large percentage of the variation between the two variables. This would correspond to which pair of R^2 and p-value?

  • Low R-squared, Low p-value
  • Low R-squared, High p-value
  • High R-squared, Low p-value
  • High R-squared, High p-value
A

Low R-squared, Low p-value

A low R-squared and low p-value indicates that the independent variable explains little variation in the dependent variable and the linear relationship between the two variables is significant.

44
Q

How to Perform a Regression Analysis in Excel (also- with dummy variables)

A
  • From the Data menu, select Data Analysis -> Regression.
  • Enter the appropriate Input Y Range and Input X Range:
    • The Input Y Range is the dependent variable, in this case selling price. The data are in column C with its label, C1:C31.
    • The Input X Range is the independent variable, in this case house size. To ensure the independent variable is labeled correctly in the output table, enter the data with its label in column B, B1:B31.
  • Check the Labels box.
  • Check the Residuals and Residual Plots boxes to ensure we see the relevant residual information.

*With Dummy Variables, you make the exact same steps, just make sure that the equation is properly in the table for the dummy variable before you enter it into the analysis.

*We interpret a dummy variable’s coefficient in the same way we interpret a coefficient for a quantitative independent variable.

*The regression analysis gives us more information than the hypothesis test alone would. Rather than simply calculating the p-value, rejecting the null hypothesis and concluding that there is a significant linear relationship, the regression results provide the direction and magnitude of this relationship.

45
Q
A

The expected selling price of homes in school districts where students have low SAT scores is B15+B16*0=B15=$389,376. You must link directly to the values in order to obtain the correct answer.

46
Q
A

The average selling price of homes, given they are located in school districts where students have low SAT scores can be calculated as AVERAGEIF(B2:B31,0,C2:C31)=$389,376.

47
Q
A

The expected selling price of homes in school districts where students have average SAT scores above 1700 is B15+B16*1=B15+B16=$809,100. You must link directly to the values in order to obtain the correct answer.

48
Q

What are the three main parts to the regression output table?

A

THREE MAIN PARTS TO THE TABLE

  • the Regression Statistics table,
  • the ANOVA table, and
  • the Regression Coefficients table.

*Review more tables

49
Q
A

“Stock of Corn at Start of Year” is the independent variable, and “Corn Acreage Planted” is the dependent variable. The beginning stock of corn at the start of the year will be used to predict the number of acres of corn that are planted.

50
Q
A

80.36%

R2 is the amount of variation in home video units that is explained by this model. 80.36% of the variation in home video units can be explained by the relationship with gross box office sales.

51
Q
A

The expected number of home video units that will be sold is B15+B16*360=7,074 thousand. You must link directly to the values in order to obtain the correct answer.