unit 3 - ch 13 - simple linear regression (slr) Flashcards

1
Q

Underlying all is for slr is

A

chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Chance

A

Correlation is passive (is)
Chance is application (of)
Are they moving in tandem (x and y)

Data always varies during to reason or chance
Chance is foundation in which regression is built

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

number of sales for six salesperson (SP)

A

We don’t know how much each salesperson sold

Number of sales → Y variable
Guess each salesperson’s sales?
Rule: You must guess the same number for each person

Your guess?
The mode → 10 (guess 6 times)
Right 2/6 times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How much error with each guess?

A

E = Y-10 (guess)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

total error –> ess –> error sum of the square

A

Ess mode = sigma (Y-10)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

comparison of guesses

We want to limit our losses but it’s like golf.. It’s not that we’re gonna hit a hole in one but what we want is to make multiple good shots to eventually get to the whole

A

Limit your losses not just a whole in one
Not really really wrong
Guessing the mean is this.. Limiting our losses
Substitute the word usually for average
How much better can we do than guessing we build off of this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

predictions

A

Guessing to predictions
X is new here

Is the x variable and y variable correlated?

Use fx function to get r value
r = 0.92.18

Use fx function for intercept
b = 2.0909

Use fx function for the slope
m = 0.8182
Line of best fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

line

A

y = mx + b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

regression equation (line of best fit)

A

y hat = b + mx

y hat = predicted value
b = y-intercept
m = slope

example
y hat = 2.0909 + 0.8182(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

FM = full model

A

Using all predictive variables (x variables)

In SLR = 1 predictive variable (FM)
Chance model had no predictive variable
Full model all predictive variables
Line of best fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

SLR: common business practices

A

Predicting and/or forecasting
>Hiring decisions
>Inventory cycles
>Future sales

Understanding underlying elements
>Marketing strategy
> Operation efficiency

Supplement executive creativity
> Reveal new insights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SLR: Looking for relationships, making associations, drawing conclusions
Example of assumptions:

A

Outfit → Purchasing Power
Job → Disposition
Car → Personality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

residuals (there is no perfect model)

A

Residuals woo!!!
Residual is another name for error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

line of best fit - on exam (multiple questions)

A

memorize the picture in notes!!!!!!!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Line of best fit not perfect fit

A

Only would happen if there would be a perfect correlation
If r = +/- 1 or e = 0 there would be a perfect fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Residuals (error) =

A

Residuals (error) = Y - Y hat
e = actual - predicted
e not r

17
Q

Ordinary least squares regression =

A

Ordinary least squares regression = less total error

Doesn’t care if error is positive or negative, cares about magnitude of error
Sigma (Y-Y hat)^2 (squaring!!!!)

Errors are often plotted either randomly or normally

18
Q

properties and qualities of residuals (error)

A

notation: a sample –> e, for a population –> funny looking e

if r = +1 or -1, e = 0

ordinary least squares regression: total error decrease

sum(y-yhat) = 0, always!, so sigma(y-yhat)square

sum(y-yhat)square –> total error: 7.81

19
Q

plot errors

A

randomly scattered around the x-axis
normally distributed
randomly scattered around the x-axis

> more errors as you draw closer to x-axis (in the middle)
model is reducing the error that is why
in taking random samples our error should be random

20
Q

plots and data point

A

If data point fall out in some sort of pattern you do not have a linear regression. Relationship can be parabolic etc. but it is not linear

If residuals fall out in a pattern it is not linear

21
Q

chance model vs full model

A

??????

FM = full model
Using all predictive variables (x variables)
In SLR = 1 predictive variable (FM)
Chance model had no predictive variable
Full model all predictive variables
Line of best fit

22
Q

Total variation in the Y-variable can divided into 2 distinct components:

A

Regression term
Y’s relationship with the X-variable(s)
Picked up in full model (has x in the formula)

Residual term
Random factors not in the model (error)
Years of experience, gender, age etc. that are not int he model that can influence sales of salesperson etc.

23
Q

Four Key Concepts for SLR
Concept 1: The coefficient of determination

A

coefficient of determination- RSQ - the percentage of the variation in the y-variavle that is explained by the variation in the x-variables

Don’t confuse the coefficient of determination (RSQ) with the correlation coefficient (r, p)
A percentage
range = 0-1
Practical because percentages are understandable

24
Q

square r

A

r = 0.9218
RSQ = about 85%
How high does RSQ need to be

Useful in context = good RSQ
Human behavior RSQ is lower because behavior is complex

25
Q

rsq: context

A

higher rsq = context

26
Q

Four Key Concepts for SLR
Concept 2: Isolating the slope

A

The affect of marginal inputs on predicted outcomes
Example: Major league baseball
Y = Wins
X = Payroll (USD, millions)
Y hat = 67 + 0.04(x)

27
Q

q1: how much does each win– above what the regression model provides– cost?

q2: if mlk team owner increases payroll b y 100 million, fans can expect what marginal affect on wins

(look at notes)

A

Hint: Rise over run (isolating the slope)

Q1 =$25 (USD, millions)
Q2 = 4 more wins

28
Q

Regression toward the mean:

A

extreme outcomes tend to be followed by moderate outcomes
Small samples have more variation than large ones

29
Q

Four Key Concepts for SLR
Concept 3: Over and under performing the model

(look at notes to see overperforming vs underperforming)

A

e = Y - Yhat

Y above the line: +e
Y below the line: -e

red circle: overperforming teams
black circles: underperforming teams

30
Q

Four Key Concepts for SLR
Concept 4: The Restricted Model

(look at notes)

A

a) CM: no predictor variables
House example: Beach house
Money lending
“How much can you lend”
“Let me guess”
“?”
CM is for comparison purposes only not for use in practice

b) FM: all predictor variables

c) RM: some predictor variables (get prequalified by loan officer)

31
Q

why use a restricted model

A

Why use a restricted model?
RM works well enough
RM is less complicated
RM is cheaper

32
Q

Linear regression for two variables is based on a linear equation with one independent variable. The equation has the form:

A

y = a + bx

33
Q

regression analysis

A

Regression analysis is a statistical technique that can test the hypothesis that a variable is dependent upon one or more other variables. Further, regression analysis can provide an estimate of the magnitude of the impact of a change in one variable on another. This last feature, of course, is all important in predicting future values.

Regression analysis is based upon a functional relationship among variables and further, assumes that the relationship is linear. This linearity assumption is required because, for the most part, the theoretical statistical properties of non-linear estimation are not well worked out yet by the mathematicians and econometricians