Ch. 9 Regression Flashcards

1
Q

What is Simple regression?

A

is the study about the linear relationship between two variables. Regression is about characterizing linear relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do you use the independent variable for in Regression?

A

We’regoingtouseittopredictsomething Xis the predictorvariable or independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do you use the Dependent variable for in Regression?

A

Ybeingthetargetvariable. we use the independent variables to predict Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which variable is plotted on the vertical axis in a scatter diagram?

A

The dependent variable is plotted on the Y-axis (vertical axis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Correlation Coefficient range and what sign represents it?

A

(r) ranges from -1 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If there is Perfect negative correlation - what is r?

A

r = -1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If there is Perfect positive correlation - what is r?

A

r = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

No linear relationship - what is r?

A

r = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Positive Correlation example

A

floorspace of homegoesup,sodoesthepriceofthehome.

As X goes up Y goes up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Negative Correlation example

A

interestratestendtogoup,newhousingstartstendtogo
downbecausehomesaremoreexpensive.

As X goes down Y goes down

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the sign of the slope when the correlation coefficient (r) is positive?

A

Positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does a correlation coefficient of -0.2 suggest?

A

Weak negative association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which Greek symbol represents the population correlation coefficient?

A

ρ

typically is .05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The slope and correlation coefficient should have the same sign? True or Flase?

A

True - Same sign (+ or -) as the correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the regression Model formula?

A

Yr = b0 + b1 * X

Y, dependent variable
bo, Y-intercept when X equals zero (population)
b1, slope for regression line (population)
X, independent variable
e, error term (difference between actual and estimated Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the value of the residual for X = 5 and Y = -19 given regression equation Yr = -1 -3 * X?

A
Yr = -1 -3 *5 = -16, 
residual = Y – Yr = -20 – (-17) = -3!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the value of the residual for X = 5 and Y = -20, given the regression equation Yr = -2 - 3

A
Yr = -2 -3 *5 = -17
residual = Y – Yr = -20 – (-17) = -3!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the sign for the Coefficient of Determination? And what does it do in regression?

A

(R2)
R2 ranges from 0 to 1
very simple statistic that allows me to judge my performance.
larger r2 the better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Coefficient of Determination (R2) Formula

What is the value of the coefficient of determination if SSE = 100 and SST =1000?

A

COD = (SST-SSE)/SST = (1000-100)/1000 = 0.90

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Explain Statistical Significance?

A

variable that can be adjusted based on how accurate you are willing to accept your model to be
.5 means its 95% accurate

Statistical significance is a determination that a relationship between two or more variables is caused by something other than chance.

Generally, a p-value of 5% or lower is considered statistically significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the approximate confidence level that corresponds to a critical t value of 2?

A

According to the empirical rule 95% of the area under the normal curve is contained within + 2 standard deviations.
Answer is 95%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Develop a specific forecast based on a given value of X

example: Yr = -7.8 + 0.81 * X (assets)

Let X (assets) = 50 and alpha = 0.05

A
example: Yr = -7.8 + 0.81 * X (assets)
Let X (assets) = 50 and alpha = 0.05
AllI'mgoingtodoistoplugthe50intothexsymbol,
multiplyitbytheslope .81,
andsubtractthey intercept,
generatingformeadebtlevelof$32.7.

32.7 = 0.81*50+-7.8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Bivariate regression analysis

- definition

A

The process of developing a statistically based linear model between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Coincident indicators

- definition

A

Metrics like GDP that move along with the general economy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Coefficient of determination (R-square)

- definition

A

Anumerical measure of the amount of variance explained by the regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Correlation analysis

- definition

A

The process of determining the extent of the relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q
Correlation coefficient (r)
- definition
A

The degree of linear association between two variables (also known as Pearson’s r) the value of correlation coefficient is between –1 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Correlation matrix

- definition

A

A table that reports the correlation coefficients between the various sets of variables

29
Q
Dependent variable (Y)
- definition
A

The variable that is being explained or predicted

30
Q

Dummy variable

- definition

A

A variable that is limited to a value of either 0 or 1

31
Q
Independent variable (X)
- definition
A

The variable used to explain the variability in, or to predict the values of, the dependent variable

32
Q

Intercept (b0)

- definition

A

The estimated value of Y when X is 0

33
Q

Least squares method

- definition

A

A procedure for developing estimates for the regression model coefficients based on minimizing the sum of the squares of the residuals

34
Q

Multicollinearity

- definition

A

A condition when two or more of the predictor variables are correlated

35
Q

Multiple regression analysis

- definition

A

The process of developing a statistically based linear model between a single dependent variable and multiple independent variables

36
Q

p-value

- definition

A

The probability that the observed sample statistic, e.g., slope (b1), occurred by chance; small p-values tend to support rejecting the null hypothesis

37
Q

Residual

- definition

A

The difference between the actual values of the dependent variable and its predicted values from the regression model

38
Q

Residual plot

- definition

A

A graph that shows a plot of the error terms (residuals) versus the values of the independent variable

39
Q

Scatter diagram

- definition

A

A plot of the independent (horizontal axis) versus dependent (vertical axis) variable values

40
Q

Simple regression analysis

- definition

A

The process of developing a statistically based linear model between two variables

41
Q

Simple regression line
- definition

What is the formula

A

An equation in the form Y = b0 + b1 * X

42
Q

Slope (b1)

- definition

A

The amount of change in Y for a one-unit change (increase or decrease) in X

43
Q

Standard error of the estimate

- definition

A

An estimate of the standard deviation of the dependent variable (Y) for any given value of the independent variable (X)

44
Q

Step-wise solution process

- definition

A

A methodology for selecting only statistically significant variables when taken in combination

45
Q

t-value

- definition

A

Typically, two t-values are used in analyzing the statistical significance of the regression model: one is the computed t and the other is the critical t

46
Q

the regression sum of squares (RSS) represents?

A

the amount of variation in Y with a change in X

47
Q

The standard error

A

measures the amount of scatter, or variation, in the actual data around the fitted regression function.

48
Q

Multiple regression analysis

A

deals with the relationship between two or more predictor variables and a target variable (response or dependent)

49
Q

examples where we can use multiple regression:

A
FICO credit score
Insurance rates
Bankruptcy potential
University admissions
Medical diagnostics
Sports performance
50
Q

What are the four Multi Regression performance Metrics?

A
  • Correlation coefficients - gives the nature and extent of the relationship between x and y
  • R-squared - gives us an indicator of how well we have done in explaining the variability
  • t- and p-values (statistical significance)
  • Betas - tells the relative impact of each of the variables on y
51
Q

Correlation coefficients

A

gives the nature and extent of the relationship between x and y

52
Q

R-squared

A

gives us an indicator of how well we have done in explaining the variability
Also called the Coefficient of Determination

53
Q

t- and p-values

A

(gives us a metric for rating the statistical significance of independent variables)

54
Q

Betas

A

tells the relative impact of each of the variables on y

55
Q

quiz: What is the value for the dependent variable when Yr = 35 + 2 * X1 – 3 * X2 where X1 = 4 and X2 = 11

A

Answer: 10
Explanation: Substituting X1 = 4, X2 = 11 into the regression equation yields: Yr = 35 + 2 * X1 – 3 * X2 = 35 + 24 - 311 = 10

56
Q

quiz: What is the standard method that is used to generate the regression model?

A

Answer: Ordinary least squares
Explanation: Ordinary least squares (OLS) is a procedure for estimating the intercept and slope coefficients in a linear regression model. This method minimizes the sum of squared errors between the actual Y values and the estimated Y values generated from the regression model.

57
Q

quiz: What is the standard procedure used to select only statistically significant variables for the regression model?

A

Answer: Stepwise
Explanation: The stepwise procedure is used for selecting predictor variables to enter (forward) the regression model or to be removed (backward) from the regression model.

58
Q

quiz: How many model combinations are possible given three predictor variables?

A

Answer: 7
Explanation: There are a total of seven different variable combinations that are possible as follows (A, B, C, AB, AC, BC, ABC).

59
Q

quiz: How does the difference between R-square and the adjusted R-square change as the sample size increases?

A

Answer: Decreases
Explanation: The adjusted coefficient of determination is the proportion of total variance in the dependent variable explained by the independent variables adjusted for the sample size and the number of predictor variables. As the sample size increases the difference between R-square and the adjusted R-square decreases.

60
Q

quiz: What does the coefficient of determination measure?

A

Answer: Measures the proportion of variation in Y explained by the predictor variables.
Explanation: Coefficient of determination (R-square) reports the proportion of total variance in the dependent variable explained by the set of independent variables.

61
Q

quiz: What is the appropriate decision when the p-value is less than alpha?

A

Answer: Reject Ho
Explanation: A p-value is the probability of obtaining a test statistic (e.g., t statistic) at least as extreme as the one that was actually observe assuming that the null hypothesis is true. Typically, the null hypothesis is rejected if the p-value is less than the stated alpha.

62
Q

quiz: Which analysis procedure is based on removing predictor variables one at a time from the regression model?

A

Answer: Backward stepwise
Explanation: The backward stepwise procedure starts will all of the selected candidate variables in the regression model. Typically, the variable with the largest p-value is removed first and the process continues until the p-values of the remaining variables are less than the stated alpha.

63
Q

quiz: Explain beta coefficient? What is it also known as?

A

A standardized beta coefficient compares the strength of the effect of each individual independent variable to the dependent variable.

They are also known as Standardized slopes

Explanation: Beta coefficients (also known as standardized slopes) are regression model coefficients that have been standardized resulting in a variance of one. This procedure is done so that one can identify the relative impact of changes in each predictor variable on the target variable when each of the predictor variables are measured in different units (e.g., dollars, age, gender).

64
Q

quiz: What is the difference between Y and Yr called?

A

Answer: Residual
Explanation: The difference between the actual Y value and the Yr value generated by the regression model is called the residual or error term.

65
Q

explain P Value?

A

a measure of probability that an observed difference could have occurred just by random choice

only one coefficient assessment at a time

66
Q

the lower the p-value then ______ .(complete the sentence)

A

the greater the statistical significance of the observed difference

essentially the lower the p value the more you can rely on the coefficient it is testing significance on

67
Q

What does The F-test do?

A

compares the fits of different linear models
assesses multiple coefficients simultaneously

can assess multiple coefficients simultaneously

68
Q

Why we need to use Hypothesis tests in statistics?

A

It tests two mutually exclusive statements about a data population to determine which statement is best supported by the sample data.