Ch. 9 Regression Flashcards
What is Simple regression?
is the study about the linear relationship between two variables. Regression is about characterizing linear relationships
What do you use the independent variable for in Regression?
We’regoingtouseittopredictsomething Xis the predictorvariable or independent variable
What do you use the Dependent variable for in Regression?
Ybeingthetargetvariable. we use the independent variables to predict Y
Which variable is plotted on the vertical axis in a scatter diagram?
The dependent variable is plotted on the Y-axis (vertical axis)
What is the Correlation Coefficient range and what sign represents it?
(r) ranges from -1 to 1
If there is Perfect negative correlation - what is r?
r = -1
If there is Perfect positive correlation - what is r?
r = 1
No linear relationship - what is r?
r = 0
Positive Correlation example
floorspace of homegoesup,sodoesthepriceofthehome.
As X goes up Y goes up
Negative Correlation example
interestratestendtogoup,newhousingstartstendtogo
downbecausehomesaremoreexpensive.
As X goes down Y goes down
What is the sign of the slope when the correlation coefficient (r) is positive?
Positive
What does a correlation coefficient of -0.2 suggest?
Weak negative association
Which Greek symbol represents the population correlation coefficient?
ρ
typically is .05
The slope and correlation coefficient should have the same sign? True or Flase?
True - Same sign (+ or -) as the correlation coefficient
What is the regression Model formula?
Yr = b0 + b1 * X
Y, dependent variable
bo, Y-intercept when X equals zero (population)
b1, slope for regression line (population)
X, independent variable
e, error term (difference between actual and estimated Y)
What is the value of the residual for X = 5 and Y = -19 given regression equation Yr = -1 -3 * X?
Yr = -1 -3 *5 = -16, residual = Y – Yr = -20 – (-17) = -3!
What is the value of the residual for X = 5 and Y = -20, given the regression equation Yr = -2 - 3
Yr = -2 -3 *5 = -17 residual = Y – Yr = -20 – (-17) = -3!
What is the sign for the Coefficient of Determination? And what does it do in regression?
(R2)
R2 ranges from 0 to 1
very simple statistic that allows me to judge my performance.
larger r2 the better
Coefficient of Determination (R2) Formula
What is the value of the coefficient of determination if SSE = 100 and SST =1000?
COD = (SST-SSE)/SST = (1000-100)/1000 = 0.90
Explain Statistical Significance?
variable that can be adjusted based on how accurate you are willing to accept your model to be
.5 means its 95% accurate
Statistical significance is a determination that a relationship between two or more variables is caused by something other than chance.
Generally, a p-value of 5% or lower is considered statistically significant.
What is the approximate confidence level that corresponds to a critical t value of 2?
According to the empirical rule 95% of the area under the normal curve is contained within + 2 standard deviations.
Answer is 95%
Develop a specific forecast based on a given value of X
example: Yr = -7.8 + 0.81 * X (assets)
Let X (assets) = 50 and alpha = 0.05
example: Yr = -7.8 + 0.81 * X (assets) Let X (assets) = 50 and alpha = 0.05 AllI'mgoingtodoistoplugthe50intothexsymbol, multiplyitbytheslope .81, andsubtractthey intercept, generatingformeadebtlevelof$32.7.
32.7 = 0.81*50+-7.8
Bivariate regression analysis
- definition
The process of developing a statistically based linear model between two variables
Coincident indicators
- definition
Metrics like GDP that move along with the general economy
Coefficient of determination (R-square)
- definition
Anumerical measure of the amount of variance explained by the regression model
Correlation analysis
- definition
The process of determining the extent of the relationship between two variables
Correlation coefficient (r) - definition
The degree of linear association between two variables (also known as Pearson’s r) the value of correlation coefficient is between –1 and 1
Correlation matrix
- definition
A table that reports the correlation coefficients between the various sets of variables
Dependent variable (Y) - definition
The variable that is being explained or predicted
Dummy variable
- definition
A variable that is limited to a value of either 0 or 1
Independent variable (X) - definition
The variable used to explain the variability in, or to predict the values of, the dependent variable
Intercept (b0)
- definition
The estimated value of Y when X is 0
Least squares method
- definition
A procedure for developing estimates for the regression model coefficients based on minimizing the sum of the squares of the residuals
Multicollinearity
- definition
A condition when two or more of the predictor variables are correlated
Multiple regression analysis
- definition
The process of developing a statistically based linear model between a single dependent variable and multiple independent variables
p-value
- definition
The probability that the observed sample statistic, e.g., slope (b1), occurred by chance; small p-values tend to support rejecting the null hypothesis
Residual
- definition
The difference between the actual values of the dependent variable and its predicted values from the regression model
Residual plot
- definition
A graph that shows a plot of the error terms (residuals) versus the values of the independent variable
Scatter diagram
- definition
A plot of the independent (horizontal axis) versus dependent (vertical axis) variable values
Simple regression analysis
- definition
The process of developing a statistically based linear model between two variables
Simple regression line
- definition
What is the formula
An equation in the form Y = b0 + b1 * X
Slope (b1)
- definition
The amount of change in Y for a one-unit change (increase or decrease) in X
Standard error of the estimate
- definition
An estimate of the standard deviation of the dependent variable (Y) for any given value of the independent variable (X)
Step-wise solution process
- definition
A methodology for selecting only statistically significant variables when taken in combination
t-value
- definition
Typically, two t-values are used in analyzing the statistical significance of the regression model: one is the computed t and the other is the critical t
the regression sum of squares (RSS) represents?
the amount of variation in Y with a change in X
The standard error
measures the amount of scatter, or variation, in the actual data around the fitted regression function.
Multiple regression analysis
deals with the relationship between two or more predictor variables and a target variable (response or dependent)
examples where we can use multiple regression:
FICO credit score Insurance rates Bankruptcy potential University admissions Medical diagnostics Sports performance
What are the four Multi Regression performance Metrics?
- Correlation coefficients - gives the nature and extent of the relationship between x and y
- R-squared - gives us an indicator of how well we have done in explaining the variability
- t- and p-values (statistical significance)
- Betas - tells the relative impact of each of the variables on y
Correlation coefficients
gives the nature and extent of the relationship between x and y
R-squared
gives us an indicator of how well we have done in explaining the variability
Also called the Coefficient of Determination
t- and p-values
(gives us a metric for rating the statistical significance of independent variables)
Betas
tells the relative impact of each of the variables on y
quiz: What is the value for the dependent variable when Yr = 35 + 2 * X1 – 3 * X2 where X1 = 4 and X2 = 11
Answer: 10
Explanation: Substituting X1 = 4, X2 = 11 into the regression equation yields: Yr = 35 + 2 * X1 – 3 * X2 = 35 + 24 - 311 = 10
quiz: What is the standard method that is used to generate the regression model?
Answer: Ordinary least squares
Explanation: Ordinary least squares (OLS) is a procedure for estimating the intercept and slope coefficients in a linear regression model. This method minimizes the sum of squared errors between the actual Y values and the estimated Y values generated from the regression model.
quiz: What is the standard procedure used to select only statistically significant variables for the regression model?
Answer: Stepwise
Explanation: The stepwise procedure is used for selecting predictor variables to enter (forward) the regression model or to be removed (backward) from the regression model.
quiz: How many model combinations are possible given three predictor variables?
Answer: 7
Explanation: There are a total of seven different variable combinations that are possible as follows (A, B, C, AB, AC, BC, ABC).
quiz: How does the difference between R-square and the adjusted R-square change as the sample size increases?
Answer: Decreases
Explanation: The adjusted coefficient of determination is the proportion of total variance in the dependent variable explained by the independent variables adjusted for the sample size and the number of predictor variables. As the sample size increases the difference between R-square and the adjusted R-square decreases.
quiz: What does the coefficient of determination measure?
Answer: Measures the proportion of variation in Y explained by the predictor variables.
Explanation: Coefficient of determination (R-square) reports the proportion of total variance in the dependent variable explained by the set of independent variables.
quiz: What is the appropriate decision when the p-value is less than alpha?
Answer: Reject Ho
Explanation: A p-value is the probability of obtaining a test statistic (e.g., t statistic) at least as extreme as the one that was actually observe assuming that the null hypothesis is true. Typically, the null hypothesis is rejected if the p-value is less than the stated alpha.
quiz: Which analysis procedure is based on removing predictor variables one at a time from the regression model?
Answer: Backward stepwise
Explanation: The backward stepwise procedure starts will all of the selected candidate variables in the regression model. Typically, the variable with the largest p-value is removed first and the process continues until the p-values of the remaining variables are less than the stated alpha.
quiz: Explain beta coefficient? What is it also known as?
A standardized beta coefficient compares the strength of the effect of each individual independent variable to the dependent variable.
They are also known as Standardized slopes
Explanation: Beta coefficients (also known as standardized slopes) are regression model coefficients that have been standardized resulting in a variance of one. This procedure is done so that one can identify the relative impact of changes in each predictor variable on the target variable when each of the predictor variables are measured in different units (e.g., dollars, age, gender).
quiz: What is the difference between Y and Yr called?
Answer: Residual
Explanation: The difference between the actual Y value and the Yr value generated by the regression model is called the residual or error term.
explain P Value?
a measure of probability that an observed difference could have occurred just by random choice
only one coefficient assessment at a time
the lower the p-value then ______ .(complete the sentence)
the greater the statistical significance of the observed difference
essentially the lower the p value the more you can rely on the coefficient it is testing significance on
What does The F-test do?
compares the fits of different linear models
assesses multiple coefficients simultaneously
can assess multiple coefficients simultaneously
Why we need to use Hypothesis tests in statistics?
It tests two mutually exclusive statements about a data population to determine which statement is best supported by the sample data.