Simple Linear Regression Flashcards
What is the goal of regression?
To predict Y (outcome variable) from X (predictor).
Which variable is fixed in a regression equation?
X is a fixed variable and Y is always the random variable.
T or F: there is no sampling error involved Y.
Why or why not?
F. There is no sampling error involved in X because X is a fixed variable, while Y is a random variable.
Concerning population parameters to go along with sample statistics in a simple linear regression, what do we predict Y from?
We predict outcome Y from beta naught (intercept) and beta 1 is (slope), multiplied by the predictor. Also the epsilon, or the residual (e) is the population error term.
What is our purpose of the modeling error?
Our purpose is the find the line that best summarizes the line between X and Y.
What is error called for population and for sample ?
What is model error?
Epsilon for population, e or residual for sample.
It is the difference between the people that deviate from the model line.
Define sampling error.
The difference between a population parameter and sample statistics.
What is goal in simple linear regression, in regards to the line?
Our goal is to be able to find the best fit of line.
We are trying to find from all possible lines, which one will result in the least amount of difference between the observed data and line.
How do we use the regression line to predict values?
We fit a statistical model to the data in the form of a straight line. This line is the line that BEST FITS the pattern of data.
What does y-hat indicate?
The line itself, to mark that it’s different than the situation where we have error is notated as y-hat.
Which contains error: The line or the model?
The model contains error.
Why is Y-hat considered a predictive Y?
What does this have to do with residuals?
Y-hat is a predictive because it signifies the Y-values that are predicted from the line
The difference between what’s predicted from the line and the observed value (Y) is the residual.
What is y-hat’s equation?
Y-hat = b0 + b1x
B0 = intercept B1 = Slope X = predictor value
How to we compute a simple linear regression on r? What does it produce?
rcorr(as.matrix(dataset))
Produces n and p value
What information do we need to create a regression equation?
We need to fill in the intercept (b0) and slope(b1) - so we need to determine the line of best fit.
How is regression conceptually similar to ANOVA?
With an ANOVA, we compared MSbetween and MS within- we want to minimize MSwithin (error), and we want to do the same with regression by making error as small as possible.
Before, we wanted to see how points deviated from the mean, but now we want to see how each point deviates from the regression line.
Why do we create a sum of squares for a simple linear regression equation?
We want to minimize the sum of the squared residuals (OLS solution).
Each point from the line gives residual, and we add them up. The problem arises because the distance of the points above the lines are the same as if they were if we added up below the line, and it adds to zero. So we must square.
What are the similarities and differences between a correlation and the simple linear regression?
If there’s just 1 predictor, we see a lot of similarities. The only thing that changes is how we treat the variables (prediction vs. description).
After we run the rcorr function on r and we see a significant result of a predictor value, what do we do next?
Since it’s highly correlated, we can predict the direction of that relationship.
Why do we compute the Ordinary least squares (OLS) solution? What is the criterion to be minimized in OLS?
We compute the OLS solution because it’s an estimation procedure done for regression where we minimize the sum of the squared residuals.
As long as we can put numbers to b0 and b1, what are all the information we can find?
Provide and example if b0 = 1 and b1 = 4.
If I have b0 equal to 1 and b1 equal to 4. I could fill that in into my regression and I would have a slope and residuals… I could fill in 1 + 4x… fill in all of my x values and get my y hat values from that and also the difference between y and y-hat to obtain the residuals… if I have my residuals, then I can do my sum of squared residuals…
Explain what derivative is and why we use the function of a derivative.
The derivative is when we find a particular location at which we can draw a line right next to the curve and get a slope, because we can never get a slope of a curve but we can find the lines that are straight that touches our curve at exactly 1 point away and this slope that is tangent to the curve will tell us how good our model is.
When we get to the minimum of sum of squared residuals function after a variety of guesses the computer makes, we will ultimately end up at the minimum of the function - the way we know we are at the minimum of the function is because where the line is tangent to the curve has a slope of zero.
This happens when we take the function and set the derivative to zero - and by the magic of calculus, we will have an equation for b0 and b1.
Why is SSy (sum of squares y) missing in the bivariate information in SL regression?
Compare this to correlation equation.
In correlation equation, we divided SSCPxy over the sqrt of SSx and SSy because we were interested in how our variables related to each other after removing the independence about each other those variables.
When I’m interested in predicting Y, I don’t care about how Y varies with itself - only about how Y varies along with X. We aren’t interested in the univariate information of Y, just how the 2 variables relate to each other after removing what is unique to X, so I’m left with Y and everything that’s shared with Y.
Conceptually, what is b1 and b2?
What is the equation for both?
b1 (slope) tells me for every 1 unit increase in X, how much Y changes. It tells me the change in Y based on changes in X.
b1 = SSCPxy / SSx
b0 (intercept) is mean of Y minus b1 times the predictor.
To obtain b0, we compute the slope first; then we multiply it by the mean of X.
b0 = Y-bar - b1(X-bar)
If b1 equals 0.758, what does this mean?
Put this into context when predicting the number of doctor visits and health problems.
For every 1 unit increase in X, Y changes .758.
For every 1 unit of additional health problems, the number of doctor visits go up .758 times.
In the context of health problems on doctor visits, what does the b0, or the intercept formula, tell us exactly?
It tells us if we had NO physical health problems, we would expect to go to the doctor .036 times (almost zero times).
What does the output for regression in R show?
What is the function?
The function is fit.
The output for regression in R shows the residuals and coefficients.
What are the interpretations of b0 and b1 regression equations?
Utilize doctor visits and health problems in both interpretations:
b0 = .036 b1 = .758
b0 interpretation:
The expected number of dv is (b0 value) when no iv has been reported.
ex) The expected number of doctor visits is .036 when no physical health problems have been reported.
b1 interpretation:
The expected number of dv is expected to increase by (b1 value) for every additional iv.
ex) The expected number of doctor visits is expected to increase by .758 for every additional physical health problem.
How do we find the best fit of a line?
By setting the derivative to 0, we found the best fit of the line in solving for the 2 unknown equations, b0 and b1.
Based on OLS, what is the equation of the smallest residual given the information we have?
The SS residual (E squared), and on the information tables, the formula is (Y-ŷ )squared.
Which of the following indicates the best fitting line:
y ŷ SSCPxy b0 b1
ŷ = we sum up the b0+b1X of all the observations to get the best fitting line.
What are given properties of regression equations BASED on OLS solutions being true?
- Sum of residuals = 0 (y-y-hat = 0)
- Sum of the squared residuals is at a minimum (y-y-hat squared)
- Sum of observed values equals sum of the fitted values (y-bar = ŷ ) - The sum of the observed values are equal to the fitted or predicted value.
- The regression line always goes through the point (X,Y)
- Residuals are uncorrelated with predictor - the relationship between x and y is uncorrelated.
- The fitted Y value is less extreme on Y than the associated X value is on X (this property is called Regression towards the mean).