unit 3 - ch 13 - simple linear regression (slr) Flashcards
Underlying all is for slr is
chance
Chance
Correlation is passive (is)
Chance is application (of)
Are they moving in tandem (x and y)
Data always varies during to reason or chance
Chance is foundation in which regression is built
number of sales for six salesperson (SP)
We don’t know how much each salesperson sold
Number of sales → Y variable
Guess each salesperson’s sales?
Rule: You must guess the same number for each person
Your guess?
The mode → 10 (guess 6 times)
Right 2/6 times
How much error with each guess?
E = Y-10 (guess)
total error –> ess –> error sum of the square
Ess mode = sigma (Y-10)^2
comparison of guesses
We want to limit our losses but it’s like golf.. It’s not that we’re gonna hit a hole in one but what we want is to make multiple good shots to eventually get to the whole
Limit your losses not just a whole in one
Not really really wrong
Guessing the mean is this.. Limiting our losses
Substitute the word usually for average
How much better can we do than guessing we build off of this
predictions
Guessing to predictions
X is new here
Is the x variable and y variable correlated?
Use fx function to get r value
r = 0.92.18
Use fx function for intercept
b = 2.0909
Use fx function for the slope
m = 0.8182
Line of best fit
line
y = mx + b
regression equation (line of best fit)
y hat = b + mx
y hat = predicted value
b = y-intercept
m = slope
example
y hat = 2.0909 + 0.8182(x)
FM = full model
Using all predictive variables (x variables)
In SLR = 1 predictive variable (FM)
Chance model had no predictive variable
Full model all predictive variables
Line of best fit
SLR: common business practices
Predicting and/or forecasting
>Hiring decisions
>Inventory cycles
>Future sales
Understanding underlying elements
>Marketing strategy
> Operation efficiency
Supplement executive creativity
> Reveal new insights
SLR: Looking for relationships, making associations, drawing conclusions
Example of assumptions:
Outfit → Purchasing Power
Job → Disposition
Car → Personality
residuals (there is no perfect model)
Residuals woo!!!
Residual is another name for error
line of best fit - on exam (multiple questions)
memorize the picture in notes!!!!!!!!!
Line of best fit not perfect fit
Only would happen if there would be a perfect correlation
If r = +/- 1 or e = 0 there would be a perfect fit
Residuals (error) =
Residuals (error) = Y - Y hat
e = actual - predicted
e not r
Ordinary least squares regression =
Ordinary least squares regression = less total error
Doesn’t care if error is positive or negative, cares about magnitude of error
Sigma (Y-Y hat)^2 (squaring!!!!)
Errors are often plotted either randomly or normally
properties and qualities of residuals (error)
notation: a sample –> e, for a population –> funny looking e
if r = +1 or -1, e = 0
ordinary least squares regression: total error decrease
sum(y-yhat) = 0, always!, so sigma(y-yhat)square
sum(y-yhat)square –> total error: 7.81
plot errors
randomly scattered around the x-axis
normally distributed
randomly scattered around the x-axis
> more errors as you draw closer to x-axis (in the middle)
model is reducing the error that is why
in taking random samples our error should be random
plots and data point
If data point fall out in some sort of pattern you do not have a linear regression. Relationship can be parabolic etc. but it is not linear
If residuals fall out in a pattern it is not linear
chance model vs full model
??????
FM = full model
Using all predictive variables (x variables)
In SLR = 1 predictive variable (FM)
Chance model had no predictive variable
Full model all predictive variables
Line of best fit
Total variation in the Y-variable can divided into 2 distinct components:
Regression term
Y’s relationship with the X-variable(s)
Picked up in full model (has x in the formula)
Residual term
Random factors not in the model (error)
Years of experience, gender, age etc. that are not int he model that can influence sales of salesperson etc.
Four Key Concepts for SLR
Concept 1: The coefficient of determination
coefficient of determination- RSQ - the percentage of the variation in the y-variavle that is explained by the variation in the x-variables
Don’t confuse the coefficient of determination (RSQ) with the correlation coefficient (r, p)
A percentage
range = 0-1
Practical because percentages are understandable
square r
r = 0.9218
RSQ = about 85%
How high does RSQ need to be
Useful in context = good RSQ
Human behavior RSQ is lower because behavior is complex
rsq: context
higher rsq = context
Four Key Concepts for SLR
Concept 2: Isolating the slope
The affect of marginal inputs on predicted outcomes
Example: Major league baseball
Y = Wins
X = Payroll (USD, millions)
Y hat = 67 + 0.04(x)
q1: how much does each win– above what the regression model provides– cost?
q2: if mlk team owner increases payroll b y 100 million, fans can expect what marginal affect on wins
(look at notes)
Hint: Rise over run (isolating the slope)
Q1 =$25 (USD, millions)
Q2 = 4 more wins
Regression toward the mean:
extreme outcomes tend to be followed by moderate outcomes
Small samples have more variation than large ones
Four Key Concepts for SLR
Concept 3: Over and under performing the model
(look at notes to see overperforming vs underperforming)
e = Y - Yhat
Y above the line: +e
Y below the line: -e
red circle: overperforming teams
black circles: underperforming teams
Four Key Concepts for SLR
Concept 4: The Restricted Model
(look at notes)
a) CM: no predictor variables
House example: Beach house
Money lending
“How much can you lend”
“Let me guess”
“?”
CM is for comparison purposes only not for use in practice
b) FM: all predictor variables
c) RM: some predictor variables (get prequalified by loan officer)
why use a restricted model
Why use a restricted model?
RM works well enough
RM is less complicated
RM is cheaper
Linear regression for two variables is based on a linear equation with one independent variable. The equation has the form:
y = a + bx
regression analysis
Regression analysis is a statistical technique that can test the hypothesis that a variable is dependent upon one or more other variables. Further, regression analysis can provide an estimate of the magnitude of the impact of a change in one variable on another. This last feature, of course, is all important in predicting future values.
Regression analysis is based upon a functional relationship among variables and further, assumes that the relationship is linear. This linearity assumption is required because, for the most part, the theoretical statistical properties of non-linear estimation are not well worked out yet by the mathematicians and econometricians