unit 3 - ch 13 - simple linear regression (slr) Flashcards by mary merid

Underlying all is for slr is

chance

How well did you know this?

Not at all

Perfectly

Chance

Correlation is passive (is)
Chance is application (of)
Are they moving in tandem (x and y)

Data always varies during to reason or chance
Chance is foundation in which regression is built

How well did you know this?

Not at all

Perfectly

number of sales for six salesperson (SP)

We don’t know how much each salesperson sold

Number of sales → Y variable
Guess each salesperson’s sales?
Rule: You must guess the same number for each person

Your guess?
The mode → 10 (guess 6 times)
Right 2/6 times

How well did you know this?

Not at all

Perfectly

How much error with each guess?

E = Y-10 (guess)

How well did you know this?

Not at all

Perfectly

total error –> ess –> error sum of the square

Ess mode = sigma (Y-10)^2

How well did you know this?

Not at all

Perfectly

comparison of guesses

We want to limit our losses but it’s like golf.. It’s not that we’re gonna hit a hole in one but what we want is to make multiple good shots to eventually get to the whole

Limit your losses not just a whole in one
Not really really wrong
Guessing the mean is this.. Limiting our losses
Substitute the word usually for average
How much better can we do than guessing we build off of this

How well did you know this?

Not at all

Perfectly

predictions

Guessing to predictions
X is new here

Is the x variable and y variable correlated?

Use fx function to get r value
r = 0.92.18

Use fx function for intercept
b = 2.0909

Use fx function for the slope
m = 0.8182
Line of best fit

How well did you know this?

Not at all

Perfectly

line

y = mx + b

How well did you know this?

Not at all

Perfectly

regression equation (line of best fit)

y hat = b + mx

y hat = predicted value
b = y-intercept
m = slope

example
y hat = 2.0909 + 0.8182(x)

How well did you know this?

Not at all

Perfectly

FM = full model

Using all predictive variables (x variables)

In SLR = 1 predictive variable (FM)
Chance model had no predictive variable
Full model all predictive variables
Line of best fit

How well did you know this?

Not at all

Perfectly

SLR: common business practices

Predicting and/or forecasting
>Hiring decisions
>Inventory cycles
>Future sales

Understanding underlying elements
>Marketing strategy
> Operation efficiency

Supplement executive creativity
> Reveal new insights

How well did you know this?

Not at all

Perfectly

SLR: Looking for relationships, making associations, drawing conclusions
Example of assumptions:

Outfit → Purchasing Power
Job → Disposition
Car → Personality

How well did you know this?

Not at all

Perfectly

residuals (there is no perfect model)

Residuals woo!!!
Residual is another name for error

How well did you know this?

Not at all

Perfectly

line of best fit - on exam (multiple questions)

memorize the picture in notes!!!!!!!!!

How well did you know this?

Not at all

Perfectly

Line of best fit not perfect fit

Only would happen if there would be a perfect correlation
If r = +/- 1 or e = 0 there would be a perfect fit

How well did you know this?

Not at all

Perfectly

Residuals (error) =

Study These Flashcards

Residuals (error) = Y - Y hat
e = actual - predicted
e not r

Ordinary least squares regression =

Study These Flashcards

Ordinary least squares regression = less total error

Doesn’t care if error is positive or negative, cares about magnitude of error
Sigma (Y-Y hat)^2 (squaring!!!!)

Errors are often plotted either randomly or normally

properties and qualities of residuals (error)

Study These Flashcards

notation: a sample –> e, for a population –> funny looking e

if r = +1 or -1, e = 0

ordinary least squares regression: total error decrease

sum(y-yhat) = 0, always!, so sigma(y-yhat)square

sum(y-yhat)square –> total error: 7.81

plot errors

Study These Flashcards

randomly scattered around the x-axis
normally distributed
randomly scattered around the x-axis

> more errors as you draw closer to x-axis (in the middle)
model is reducing the error that is why
in taking random samples our error should be random

plots and data point

Study These Flashcards

If data point fall out in some sort of pattern you do not have a linear regression. Relationship can be parabolic etc. but it is not linear

If residuals fall out in a pattern it is not linear

chance model vs full model

Study These Flashcards

??????

FM = full model
Using all predictive variables (x variables)
In SLR = 1 predictive variable (FM)
Chance model had no predictive variable
Full model all predictive variables
Line of best fit

Total variation in the Y-variable can divided into 2 distinct components:

Study These Flashcards

Regression term
Y’s relationship with the X-variable(s)
Picked up in full model (has x in the formula)

Residual term
Random factors not in the model (error)
Years of experience, gender, age etc. that are not int he model that can influence sales of salesperson etc.

Four Key Concepts for SLR
Concept 1: The coefficient of determination

Study These Flashcards

coefficient of determination- RSQ - the percentage of the variation in the y-variavle that is explained by the variation in the x-variables

Don’t confuse the coefficient of determination (RSQ) with the correlation coefficient (r, p)
A percentage
range = 0-1
Practical because percentages are understandable

square r

Study These Flashcards

r = 0.9218
RSQ = about 85%
How high does RSQ need to be

Useful in context = good RSQ
Human behavior RSQ is lower because behavior is complex

rsq: context

higher rsq = context

Four Key Concepts for SLR Concept 2: Isolating the slope

The affect of marginal inputs on predicted outcomes Example: Major league baseball Y = Wins X = Payroll (USD, millions) Y hat = 67 + 0.04(x)

q1: how much does each win-- above what the regression model provides-- cost? q2: if mlk team owner increases payroll b y 100 million, fans can expect what marginal affect on wins (look at notes)

Hint: Rise over run (isolating the slope) Q1 =$25 (USD, millions) Q2 = 4 more wins

Regression toward the mean:

extreme outcomes tend to be followed by moderate outcomes Small samples have more variation than large ones

Four Key Concepts for SLR Concept 3: Over and under performing the model (look at notes to see overperforming vs underperforming)

e = Y - Yhat Y above the line: +e Y below the line: -e red circle: overperforming teams black circles: underperforming teams

Four Key Concepts for SLR Concept 4: The Restricted Model (look at notes)

a) CM: no predictor variables House example: Beach house Money lending “How much can you lend” “Let me guess” “?” CM is for comparison purposes only not for use in practice b) FM: all predictor variables c) RM: some predictor variables (get prequalified by loan officer)

why use a restricted model

Why use a restricted model? RM works well enough RM is less complicated RM is cheaper

Linear regression for two variables is based on a linear equation with one independent variable. The equation has the form:

y = a + bx

regression analysis

Regression analysis is a statistical technique that can test the hypothesis that a variable is dependent upon one or more other variables. Further, regression analysis can provide an estimate of the magnitude of the impact of a change in one variable on another. This last feature, of course, is all important in predicting future values. Regression analysis is based upon a functional relationship among variables and further, assumes that the relationship is linear. This linearity assumption is required because, for the most part, the theoretical statistical properties of non-linear estimation are not well worked out yet by the mathematicians and econometricians

unit 3 - ch 13 - simple linear regression (slr) Flashcards

(33 cards)