Final session 11b Flashcards
when to use Simple Linear Regression
When we want to summarize the linear relationship between two variables, X and Y
how do we do Simple Linear Regression
We can do this by drawing a straight line on the scatterplot
a straight line on the scatterplot is called what
regression line
The regression line is a straight line that describes how what
Y changes as X changes
For a given observation (i), our simple linear regression equation:
Yi = b0 + b1X1i + ei
explain the parts to Yi = b0 + b1X1i + ei
Yi is the value of the DV for observation i X1i is the value of the IV for observation i ei is the residual for person i
Residuals are called what in the population
errors
whats assumed about Errors
assumed normally distributed with a constant variance
How to determine the best regression line?
The “best” regression line is one that has the smallest residuals
what is Residual
- “vertical” difference between the regression line and each data
point
what method is typically used determine the best regression line
Method of least squares
what is Method of least squares
the most common method. The least-squares regression line of Y on X1 is the line that makes the sum of squared residuals as small as possible
The least-squares regression line of Y on X1 is determined in such a way that it makes SSR ….
as small as possible
b0 and b1 are determined todo what
minimize SSR
what is the goal of least squares
In other words, this method aims to minimize the unexplained portion of Y by the regression line
how to Simple regression used for prediction
Using our regression line, if we only had a value on X1, we could predict the value of Y
Plug in value of predictor(s) into the equation for the regression line
what is the Statistical test for significance for
Often times it is of interest to test the relationship between Y and a predictor variable
If we call β1 the population value for b1, is β1 equal to zero in the population?
Statistical test for significance of the slope: give the hypotheses
Typically we are testing this hypothesis:
H0 :β1 =0
There is no linear relationship between X1 and Y (no effect of X1 on
Y)
H1 :β1 ̸=0
There is a linear relationship between X1 and Y
To test H0 for a slope, we also use what
a t-test of the form:
t = b1 − 0 / sb1
what is the df for Statistical test for significance of the slope
dfR = N − 2
the observed value of t is greater than a critical value of t with dfR = N − 2 (and α = .05), we may reject the null hypothesis
This indicates what
that the slope is significantly different from zero, suggesting a statistically significant effect of X1 on Y
For the t-test to provide accurate results, the following assumptions are required:
Relationship between predictor and outcome is linear Independent observations
Homoscedasticity
Normally distributed errors
what is Homoscedasticity
Variance of errors does not depend on the value of the predictor, X1
(simple linear regression) As in ANOVA, we can also divide the variance (or variation) in the DV (Y ) into different parts resulting from different sources
In regression analysis, the total variation in Y is partitioned into:
SSM : The variation in Y that is explained by the model (i.e., regression line)
SSR: The variation in Y that is unexplained by the regression line (i.e., the residuals)
SST : Total amount of variation in Y
SST =SSM +SSR
for Simple Linear Regression, The F statistic can be used for testing what
whether the model overall significantly predicts the dependent variable
The F statistic can be used for testing whether the model overall significantly predicts the dependent variable
In the case of a single predictor, what is the hypotheses
H0 :β1 =0 H1 :β1 ̸=0
If the observed F ratio is greater than a critical value of F with dfM and dfR at α, we may _____ H0
reject
explain Coefficient of Determination (R2)
Proportion of the total variation in Y accounted for by the model R2 = SSM / SST
Coefficient of Determination (R2) Ranges from 0 to 1 explain
The larger R2, the more variance of the DV is explained 0 = No explanation
1 = Perfect explanation
In simple regression, the relationship with Pearson correlation (r) is:
r2 = R2
R2 is____ high; An adjusted value, R2 adj , is______
biased, unbiased