chapter 13: the simple linear regression model Flashcards
the simple linear regression model
the simple linear regression model assumes that the relationship between the dependent variable and independent variable can be approximated by a straight line
y: decedent variable
x: independent variable
what can we use to tentatively decide wether there is an approximate straight line relationship between x and y
scatter plot
scatter diagram
what is the the simple linear regression model formula
y = B0 + B1x + E
contains the mean level Uy
the y intercept B0
the slope B1
the error term E
what is the mean level of the simple linear regression model formula?
Uy = B0 + B1x
the line of means
the values of y can be represented by the mean level
the value changes in the straight line represented by Uy
the y intercept: B0
the slope: B1
the error term E
describes the effects on y of all factors other than the value of independent variable x
can be positive, negative or 0
what does it mean for the error term E to be 0?
there is no difference between the mean level Uy and and just y
what does it mean for the error term E to be bigger than 0?
the point will be above what is should be according to the Uy = B0 + B1x
it will be above than the corresponding x value
what does it mean for the error term E to be lower than 0?
the point will be below what is should be according to the Uy = B0 + B1x
it will be lower than the corresponding x value
what is the impact of B1 (the slope)
if B1 is positive, the regression line will go up
if B1 is negative, the regression line will go down
what are the regression parameters
the y intercept B0
the slope B1
true or false
we can reflect the changes made in the regression line as a change in the independent variable causing a change in the dependent variable
false
we can say the effect of the independent variable on the dependent variable
we can say that the two variables move together and that the independent variable contributes to information predicting the independent variable
the least square line
the best visual estimated regression line
y^ = b0 + b1x
y^: the predicted value of y
b0: point estimate of y intercept BO
b1: point estimate of slope of Uy B1
how is the predicted value of the dependent variable y found
yî = b0 + b1xi
b1 = SSxy / SSxx
b0 = y- - b1x-
what is the residual of an observation?
yi - y^
yi: the observed y
what is the experimental region
the range of previously observed population sizes
the point prediction of an individual value
the point prediction of an individual value of the dependent variable when the value of the independent variable is X0
here we predict the error term to be 0
simple coefficient of determination
a measure of potential selfness in the simple linear regression model
r^2 (r squared)
explained variation / total variation
r^2 always bigger than 0, but never bigger than 1
the closer it is to 1, the larger the proportion of the total variation that is explained by the simple linear regression model, the greater it can predict y
how do you calculate the error of prediction in the simple coefficient determination?
yi - y-
y- (mean y), only works if we are not considering changes to x
yi - y^ if we are considering the changes to x
what is the total variation?
the sum of squared prediction errors
this quantity measures the Toal amount of variation exhibited by the observed values of y
explained variation + unexplained variation
what is the unexplained variation
another name of the SSE
the sum of squared prediction errors when we use the predictor variable x
quantity that measures the amount of variation in the values of y that is not explained by the predictor variable
total variation- explained variation
explained variation
total variation - unexplained variation
what is the best way to get prediction accuracy
by calculating a prediction interval
the simple correlation coefficient r
a measure of correlation and strength of linear relationship between x and y
r = +sqrt(r^2) if b1 is positive
r = -sqrt(r^2) if b1 is negative
why can r be negative but not r^2
cause bruv, something ^2 can’t be negative
r though, it can be negative
stays between -1 and 1
what does it mean for x and y to be highly related and positively correlated
r is near 1
what does it mean for x and y to be highly related and negatively correlated
r is near -1
what are the regression assumptions?
- at any given value of x, the population of potential error term values has a mean equal to 0
- there is a constant variance assumption
at any value of x, the population of potential error term values has a variance that does not depend on the value of x
error term values per x values have equal variances
- normality assumption
- independence assumption
what does it mean for different populations of potential error term values per corresponding values of to have equal variances?
at any value of x, the population of potential error term values has a variance that does not depend on the value of x
whats the normality assumption (3) of the regression assumptions?
at any given value x, the population of error term values has a normal distribution
whats the independence assumption (4) of the regression assumptions?
any one value of the error term e is statistically independent of any other value of E
E of a certain y independent to any other E of another y
they don’t affect each other
what do the overall regression assumptions mean?
for every value of x, the population of potential error term values is normally distributed
the mean of the population of error terms is 0
the variance does not depend on the value of x
why do we predict the mean of the population of errors terms to be 0?
because it has a 50% chance of being positive, and 50% chance of being negative
what is the mean square error?
the point estimate of the variance
what is the standard error
the point estimate of the standard deviation
which is the best line to observe data?
why?
the least square regression line
It is the line minimizing the sum of the squared residuals
why is it dangerous to extrapolate out of the experimental region?
because we do not know that x and y have a linear relationship outside the experimental region
explain what is the total variation, explained variation, and unexplained variation
The total variation is the sum of the squared prediction errors when we do not use the predictor x
The unexplained variation is the sum of the squared prediction errors when we do use the predictor x
The explained variation measures the improvement in the fit when we do use the predictor x
when is a simple linear regression model useful?
when there is a significant relationship between x and y
how do we test the si significance of the relationship between x and y
with the null hypothesis (h0) being B1 = 0
this says there is no change in the mean value of y associated with an increase of x
ha: B1 =/= 0
if the regression assumptions hold, how many degrees of freedom does the t distribution have (cause you gotta use t distribution)
n - 2
is there a difference between using a one sided or two sided critical value or p-value?
nah boy
at what significance level do we see very strong evidence that the regression relationship is significant?
0.05 significance level
at what significance level do we see strong evidence that the regression relationship is significant?
0.01 significance level
how do we test the significance of the y intercept
H0: B0 = 0
Ha: B0 =/= 0
how do we reject H0 in favor of Ha when testing the significance of the y intercept?
setting the probability of a type 1 error
what is the f test
another way of checking the significance of the slope
checking if H0: B1 = 0
how do you do the f test?
F = explained variation / (unexplained variation) / (n - 2))
what is the difference between a confidence interval and a regression interval?
A confidence interval is intended to capture the mean value of y
based on standard error sy^
A prediction interval is intended to capture an individual observation of y
based on standard rarer s(y - y^)
what does the distance value measure
The distance between xo and 𝑥-
xo: value of x corresponding to a a certain point estimate or point prediction
x-: mean of x values
how does the distance value affect confidence intervals
the bigger the distance value, the larger the confidence interval