Chpt 13 - Simple Linear Regression Flashcards
What is the statistical method to model the linear relationship between 2 numerical variables?
Simple linear regression
How do we know if two variables are linearly related?
If the mean of response variable Y is linearly dependent upon the value of predictor variable X
What is the linear equation that determines a fitted line?
Also called the regression equation
Y = βo + β1x
Where:
βo - is the intercept
B1- is the slope
What is a graphical method to determine the relationship between X (predictor) and Y (response)?
Scatter plots
When looking at a scatter plot, if the means of Y at different values of X are close to a straight line (although not necessarily right on the line), what can be said about the relationship?
It is linearly related
If the mean of Y decreases as the value of x increases, what type of linear relationship is this?
Negative
If the mean of Y increases as the value of X increases, what type of linear relationship is this?
Positive
What variable is X when looking at simple linear regression?
The predictor value
What variable is Y when looking at simple linear regression?
The response value
How do we denote the mean of Y when the predictor value is x?
μY|X=x
What is the best fitted line that is found based on the least-squares criterion?
Regression line
or least-squares line
What is the least squares criterion?
The line that best fits a set of data points is the one having the SMALLEST possible sum of the squared errors (residuals) which are made in using the fitted line to predict the y values
Basically, it helps us to determine the best line for the set of data points
Which value represents the sum of square of the difference between x and it’s mean
Sxx
What does Sxx represent?
The sum of the square of the difference between x and its mean
What is the computing formula for Sxx?
Sxx = Σxsquared - ((Σx)squared/n)
Which value represents the measure of the total variability of the yi’s from y?
Syy
What does Syy represent?
The measure of the total variability of the yi’s from the y
What is the computing formula for Syy?
Syy = Σysquared - ((Σy)squared/n)
What value represents the sum of the product of the differences between x values and the mean of x and the differences between y values and the mean of y?
Sxy
What does Sxy represent?
the sum of the product of the differences between x values and the mean of x and the differences between y values and the mean of y
What is the computing formula for Sxy?
Sxy = Σxy - ((Σx)(Σy)/n)
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. The values (all in $1000 except n) are:
Age Price ($1000)
1 14
1 13
3 13
4 10
4 10
5 9
5 9
6 7
7 7
7 8
8 7
8 6
10 5
10 4
13 3
What is the value of Σx?
Σx = 92
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. The values (all in $1000 except n) are:
Age Price ($1000)
1 14
1 13
3 13
4 10
4 10
5 9
5 9
6 7
7 7
7 8
8 7
8 6
10 5
10 4
13 3
What is the value of Σxsquared?
Σxsquared = 724
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. The values (all in $1000 except n) are:
Age Price ($1000)
1 14
1 13
3 13
4 10
4 10
5 9
5 9
6 7
7 7
7 8
8 7
8 6
10 5
10 4
13 3
What is the value of Σy?
Σy = 125
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. The values (all in $1000 except n) are:
Age Price ($1000)
1 14
1 13
3 13
4 10
4 10
5 9
5 9
6 7
7 7
7 8
8 7
8 6
10 5
10 4
13 3
What is the value of Σysquared?
Σysquared = 1193
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. The values (all in $1000 except n) are:
Age Price ($1000)
1 14
1 13
3 13
4 10
4 10
5 9
5 9
6 7
7 7
7 8
8 7
8 6
10 5
10 4
13 3
What is the value of Σxy?
Σxy = 616
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are:
n = 15
Σx = 92
Σxsquared = 724
Σy = 125
Σysquared = 1193
Σxy = 616
What is the value of Sxx?
Sxx = Σxsquared - ((Σx)squared/n)
= 724 - (92squared/15)
= 159.733
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are:
n = 15
Σx = 92
Σxsquared = 724
Σy = 125
Σysquared = 1193
Σxy = 616
What is the value of Syy?
Syy = Σysquared - ((Σy)squared/n)
= 1193 - (125squared/15)
= 151.333
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are:
n = 15
Σx = 92
Σxsquared = 724
Σy = 125
Σysquared = 1193
Σxy = 616
What is the value of Sxy?
Sxy = Σxy - ((Σx)(Σy)/n)
= 616 - (92x125/15)
= -150.667
How do we calculate the value of slope?
b1 = Sxy/Sxx
How do we calculate the value of the y intercept?
bo = ȳ - b1x̄
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are:
n = 15
Σx = 92
Σxsquared = 724
Σy = 125
Σysquared = 1193
Σxy = 616
Sxx = 159.733
Syy = 151.333
Sxy = -150.667
What is the value for the slope?
b1 = Sxy/Sxx
= -150.667/159.733
= -0.9432
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are:
n = 15
Σx = 92
Σxsquared = 724
Σy = 125
Σysquared = 1193
Σxy = 616
Sxx = 159.733
Syy = 151.333
Sxy = -150.667
b1 = -0.9432
What is the value for the intercept?
bo = ȳ - b1x̄
= (Σy)/n - b1*(Σx)/n
=(Σy - b1*Σx)/n
= (125-(-0.9432)*92)/15
= 14.118
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are:
n = 15
Σx = 92
Σxsquared = 724
Σy = 125
Σysquared = 1193
Σxy = 616
Sxx = 159.733
Syy = 151.333
Sxy = -150.667
b1 = -0.9432
What will happen to the cost of a car when it becomes 1 year older?
It’s price will decrease by 0.9432 thousand
When we use the fitted regression equation to make a prediction, what should we avoid? What does this mean? Why do we avoid it?
We avoid extrapolation
This means predicting the value of a response variable when the value of the predictor variable is outside the observed range.
Because the very high and low ends are always going to give you really wonky numbers… like a car doesn’t become worth a negative amount of money just because it is now 20 years old. Cheap yes, them pay you to take it away, not likely :)
What is the correlation coefficient and what is it denoted by?
Denoted by r
Measures the strength of the linear relationship between the response variable and predictor variable
-1 is a strong negative relationship
1 is a strong positive relationship
0 is no relationship
How to the values of the correlation coefficient tell us about the relationship between the response variable and predictor variable?
-1 is a strong negative relationship
1 is a strong positive relationship
0 is no relationship
How is the correlation coefficient (r) calculated?
r = Sxy/(√SxxSyy)
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are:
n = 15
Σx = 92
Σxsquared = 724
Σy = 125
Σysquared = 1193
Σxy = 616
Sxx = 159.733
Syy = 151.333
Sxy = -150.667
b1 = -0.9432
What is the correlation coefficient? What does this tell us about the relationship between age of the car and the sale price?
r = Sxy/√(SxxSyy)
= -150.667 / √(159.733*151.333)
= -0.969
There is a strong negative linear association so as the age increases, the sale price decreases
What would an r=0.01 value tell us?
There is no linear relationship
What would an r=0.5 value tell us?
That there is a positive linear relationship, but it’s not very strong
What would an r=1 value tell us?
A very strong positive linear relationship
What would an r=-0.5 value tells us?
There is a negative linear relationship, but its not very strong
What would an r=-1 value tell us?
A very strong negative linear relationship
What is the coefficient of determination and what is it denoted by?
Denoted as Rsquared
A method to evaluate the utility of a regression equation for making predictions. It measures the percentage of variation in the observed values of the response variable that can be explained by the regression model
How can we calculate the coefficient of determination (Rsquared)?
Rsquared = rsquared
The value is always between 0 and 1
What does the value of the coefficient of determination (Rsquared) tell us?
The value is always between 0 and 1. When the Rsquared value is near 1, it indicates that the regression model is useful for making predictions
What is the distribution of the response variable at a given predictor value called?
Conditional distribution because the distribution depends on the x value….this is the 3D model the guy drew with the superimposed distributions
We have a correlation coefficient value of r=-0.9691, what is the coefficient of determination?
Rsquared = rsquared
=(-0.9691)squared
=0.9392
=93.92% of the variation in the observed y-values can be explained by the linear regression equations
The larger the Rsquared value is, the more useful it is
What is the mean and standard deviation of the conditional distribution called
Conditional mean and conditional standard deviation
For a single, observation of the response variable, it is very unlikely to be the condiitonal mean exactly. How do we introduce this to the slope calculation?
y = βo + β1x + ε
Where ε is the error term added to the model to capture the variation
What assumptions need to be met for making inferences about a linear regression equation?
Normal population - each conditional distribution should be normal
conditional mean of Y at X=x is βo + β1x
Equal standard deviation of all the conditional standard deviations
Independent observations
What is a residual and what is it denoted by?
It is denoted by e
It is the difference between an observation and where you expect it to be
It is an estimate of error ε
What is the equation for the residual?
ei = yi - (bo+b1xi)