Simple Linear Regression Flashcards

1
Q

What is the goal of regression?

A

To predict Y (outcome variable) from X (predictor).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which variable is fixed in a regression equation?

A

X is a fixed variable and Y is always the random variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

T or F: there is no sampling error involved Y.

Why or why not?

A

F. There is no sampling error involved in X because X is a fixed variable, while Y is a random variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Concerning population parameters to go along with sample statistics in a simple linear regression, what do we predict Y from?

A

We predict outcome Y from beta naught (intercept) and beta 1 is (slope), multiplied by the predictor. Also the epsilon, or the residual (e) is the population error term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is our purpose of the modeling error?

A

Our purpose is the find the line that best summarizes the line between X and Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is error called for population and for sample ?

What is model error?

A

Epsilon for population, e or residual for sample.

It is the difference between the people that deviate from the model line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define sampling error.

A

The difference between a population parameter and sample statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is goal in simple linear regression, in regards to the line?

A

Our goal is to be able to find the best fit of line.

We are trying to find from all possible lines, which one will result in the least amount of difference between the observed data and line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do we use the regression line to predict values?

A

We fit a statistical model to the data in the form of a straight line. This line is the line that BEST FITS the pattern of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does y-hat indicate?

A

The line itself, to mark that it’s different than the situation where we have error is notated as y-hat.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which contains error: The line or the model?

A

The model contains error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is Y-hat considered a predictive Y?

What does this have to do with residuals?

A

Y-hat is a predictive because it signifies the Y-values that are predicted from the line

The difference between what’s predicted from the line and the observed value (Y) is the residual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is y-hat’s equation?

A

Y-hat = b0 + b1x

B0 = intercept
B1 = Slope
X = predictor value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to we compute a simple linear regression on r? What does it produce?

A

rcorr(as.matrix(dataset))

Produces n and p value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What information do we need to create a regression equation?

A

We need to fill in the intercept (b0) and slope(b1) - so we need to determine the line of best fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is regression conceptually similar to ANOVA?

A

With an ANOVA, we compared MSbetween and MS within- we want to minimize MSwithin (error), and we want to do the same with regression by making error as small as possible.

Before, we wanted to see how points deviated from the mean, but now we want to see how each point deviates from the regression line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why do we create a sum of squares for a simple linear regression equation?

A

We want to minimize the sum of the squared residuals (OLS solution).

Each point from the line gives residual, and we add them up. The problem arises because the distance of the points above the lines are the same as if they were if we added up below the line, and it adds to zero. So we must square.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the similarities and differences between a correlation and the simple linear regression?

A

If there’s just 1 predictor, we see a lot of similarities. The only thing that changes is how we treat the variables (prediction vs. description).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

After we run the rcorr function on r and we see a significant result of a predictor value, what do we do next?

A

Since it’s highly correlated, we can predict the direction of that relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why do we compute the Ordinary least squares (OLS) solution? What is the criterion to be minimized in OLS?

A

We compute the OLS solution because it’s an estimation procedure done for regression where we minimize the sum of the squared residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

As long as we can put numbers to b0 and b1, what are all the information we can find?

Provide and example if b0 = 1 and b1 = 4.

A

If I have b0 equal to 1 and b1 equal to 4. I could fill that in into my regression and I would have a slope and residuals… I could fill in 1 + 4x… fill in all of my x values and get my y hat values from that and also the difference between y and y-hat to obtain the residuals… if I have my residuals, then I can do my sum of squared residuals…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Explain what derivative is and why we use the function of a derivative.

A

The derivative is when we find a particular location at which we can draw a line right next to the curve and get a slope, because we can never get a slope of a curve but we can find the lines that are straight that touches our curve at exactly 1 point away and this slope that is tangent to the curve will tell us how good our model is.

When we get to the minimum of sum of squared residuals function after a variety of guesses the computer makes, we will ultimately end up at the minimum of the function - the way we know we are at the minimum of the function is because where the line is tangent to the curve has a slope of zero.

This happens when we take the function and set the derivative to zero - and by the magic of calculus, we will have an equation for b0 and b1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why is SSy (sum of squares y) missing in the bivariate information in SL regression?

Compare this to correlation equation.

A

In correlation equation, we divided SSCPxy over the sqrt of SSx and SSy because we were interested in how our variables related to each other after removing the independence about each other those variables.

When I’m interested in predicting Y, I don’t care about how Y varies with itself - only about how Y varies along with X. We aren’t interested in the univariate information of Y, just how the 2 variables relate to each other after removing what is unique to X, so I’m left with Y and everything that’s shared with Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Conceptually, what is b1 and b2?

What is the equation for both?

A

b1 (slope) tells me for every 1 unit increase in X, how much Y changes. It tells me the change in Y based on changes in X.

b1 = SSCPxy / SSx

b0 (intercept) is mean of Y minus b1 times the predictor.

To obtain b0, we compute the slope first; then we multiply it by the mean of X.

b0 = Y-bar - b1(X-bar)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

If b1 equals 0.758, what does this mean?

Put this into context when predicting the number of doctor visits and health problems.

A

For every 1 unit increase in X, Y changes .758.

For every 1 unit of additional health problems, the number of doctor visits go up .758 times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

In the context of health problems on doctor visits, what does the b0, or the intercept formula, tell us exactly?

A

It tells us if we had NO physical health problems, we would expect to go to the doctor .036 times (almost zero times).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What does the output for regression in R show?

What is the function?

A

The function is fit.

The output for regression in R shows the residuals and coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are the interpretations of b0 and b1 regression equations?

Utilize doctor visits and health problems in both interpretations:

b0 = .036
b1 = .758
A

b0 interpretation:
The expected number of dv is (b0 value) when no iv has been reported.
ex) The expected number of doctor visits is .036 when no physical health problems have been reported.

b1 interpretation:
The expected number of dv is expected to increase by (b1 value) for every additional iv.
ex) The expected number of doctor visits is expected to increase by .758 for every additional physical health problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How do we find the best fit of a line?

A

By setting the derivative to 0, we found the best fit of the line in solving for the 2 unknown equations, b0 and b1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Based on OLS, what is the equation of the smallest residual given the information we have?

A

The SS residual (E squared), and on the information tables, the formula is (Y-ŷ )squared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Which of the following indicates the best fitting line:

y
ŷ 
SSCPxy
b0
b1
A

ŷ = we sum up the b0+b1X of all the observations to get the best fitting line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are given properties of regression equations BASED on OLS solutions being true?

A
  1. Sum of residuals = 0 (y-y-hat = 0)
  2. Sum of the squared residuals is at a minimum (y-y-hat squared)
  3. Sum of observed values equals sum of the fitted values (y-bar = ŷ ) - The sum of the observed values are equal to the fitted or predicted value.
  4. The regression line always goes through the point (X,Y)
  5. Residuals are uncorrelated with predictor - the relationship between x and y is uncorrelated.
  6. The fitted Y value is less extreme on Y than the associated X value is on X (this property is called Regression towards the mean).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What does the 6th property of regression equations mean(Regression towards the mean): “The y-hat values are less extreme on Y than the associated X value is on X”

A

If we plug in the SD value (above or below the mean), into the regression equation, I will get closer to or regress towards the mean and the outcome will be less extreme.

So this makes the ŷ values less extreme (location is closer to the mean) on Y than their corresponding x-values on X.

Ex) ŷ = .036 + .758 (7.58)
= 5.97

34
Q

Define standard deviation and variance.

A

SD = the average deviation from the mean

Variance = the average SQUARED deviation from the mean

35
Q

If I want to know the variability of the points around the regression line (instead of the variability of points around the mean), what must I compute?

Provide the formula for the Standard Error of the Estimate (Residual standard error).

A

We take the estimate of the scores around the regression line by computing the SD (sum of squares over df) and variance (squared).

Formula for regression line:
sqrt of Σ (Y-ŷ)² / n-k-1
*n is # of rows or observations

*k is # of predictors (1 for simple linear regression).

36
Q

What is the definition of variance of the estimate?

A

The average squared distance of each point around the regression line

37
Q

In order to use standard error of the estimate of the meaningful way, what has to happen?

A

Our residuals have to distributed normally.

38
Q

What is the formula for the standard error of the slope?

*it will be on the exam (is on the HW assignment).

A

The square root of MSresidual over SSx.

Numerator:
It is the square root of the residual, or the variability of our model after removing everything associated to X.

Denominator:
Square root of the SSx.

39
Q

Why do we compute a one-sample t-test for regression?

Conceptually, what’s the numerator and denominator indicative of?

What is the formula?

What does a significant t-value indicate?

A

T-tests are a measure of whether the predictor is making a significant contribution to the model.

We are testing the hypothesis that a b-coefficient significantly differs from zero (meaning it’s significantly different from the mean).

> If the slope (b1) is not different from zero then we can’t use our x-variable to predict y because the slope would be flat (because if our x was 17, it would still be in the same place on y).

We are comparing the differences between the b and the amount of error - if the standard error is small, we can conclude that the b-values are all similar to the b-value in our sample.

t = b-observed ÷ standard error.

A larger t-value, or smaller p-value indicates that the predictor had a significant effect on predicting the outcome.

40
Q

What does it mean when a t-test produces a significant effect for the slope?

What if the intercept is not significant?

A

Our slope is significantly different from zero, which is what we want.

If the intercept is not significant, it is not different from the origin.

41
Q

What are the various synonyms of ŷ?

A

Predicted line

Fitted value

42
Q

When we examine the model fit, what are we examining?

A

When we take off the hat and add an e for error, we want to know if we’re doing a good job predicting the y-value from our line (which is dependent on x-information).

43
Q

What are the identities of model fit?

What are we doing when we examine the model fit?

A

Total deviation = Regression + Residual

We are trying to see how good the predictor is.

44
Q

What is the formula for Regression deviation?

A

(ŷ-ybar)

45
Q

Compare regression deviation and residual deviation to ANOVA.

What are we trying to do when we solve for regression deviation?

What’s the conceptual equation for the Regression deviation?

A

The regression deviation (or model) is like SSbetween. It’s the amount of variability in Y that’s explained by our line (predictor).

The residual deviation is what can’t be explained by the line.

I want to know how my model has improved my ability to predict Y over just having the mean of Y. Because ŷ involves x information, so Im looking at: Am I able to improve my ability to predict Y by including y predictor, or is there no real difference between what I have no based on the model and the mean of y.

We are trying to see if ADDING the predictor improves my ability to predict y from x, or is it no different from what I have based on our model.

So it’s the improvement of model over what’s unexplained by my model.

46
Q

In a graph of partitioned quantities, how would we identify whether the regression or residual explains the variability? (p.10)

How do we compute the variability of the points?

A

Anything above the y bar is residual, while everything below the y bar is regression. If there is a larger distance between the line and observed value below the slope, it indicates that the regression value is larger than the residual - it’s a good predictor bc we are explaining more variability by including the predictor (or doctor visits).

We compute the variability of the points by doing squared sums of squares / df… eventually doing MSregression/MSresidual (similar to ANOVA)

47
Q

Since we can partition the variability in regression (SS model and SSerror), what can we build?

A

A model fit ANOVA table

48
Q

Build the model fit ANOVA table.

A

Looks great!

49
Q

What is the interpretation of a significant F-value following a model fit ANOVA table?

What does the significant F-value indicate?

A

The regression model significantly fits the data, such that the number of reported physical health problems significantly predicts the number of doctor visits, F (1,8) = 5.85, p

50
Q

How do you run an ANOVA table in R?

A

(model

51
Q

If our predictor is found to be good, what do we compute next?

A

R² = The effect size, or the proportion of variance.

We divide SSregression with SStotal and report the number as a percentage.

52
Q

What’s the difference in r² and R²?

A

r² is correlation effect size, and we square the correlation.

R² is the effect size of regression, where we divide SSregression by SStotal. R² is also the squared correlation between the predictor and outcome.

53
Q

What is the interpretation for R² if it equals .4278?

A

47% of the variability in Y (number of doctor visits) can be explained by the number of X (reported physical healthy problems).

54
Q

Is 47% of variability enough to explain the majority of the raw proportions?

A

No, we’re not explaining the majority of variability of X on Y. We’re only explaining per degrees of freedom.

55
Q

List and describe the Gauss-Markov Assumptions. There are 7 properties.

A

Running regressions are contingent on Gauss-Markov Assumptions. We also need to know how to test them (which happens after we run the regression model and then evaluate it - after data screening). If any of these are violated, we can look to other tests.

The first 3 assumptions are about our variables:

  1. All predictors are quantitative (numeric) or dichotomous (gender dummy coded as 0 or 1), and the criterion is quantitative, continuous, and unbounded (-∞ to ∞). All variables are measured without error.
  2. All predictors must have non-zero variance. A zero variance indicates that the predictor is constant. We can’t make a line, and mathematically will be Undefined.
  3. There is an absence of perfect multicollinearity. Multicollinearity states two or more predictor variables in a multiple regression model are highly correlated.. but predictors should not be the same. On a graph, it will look platykurtic.

The last 3 assumptions are about our error:

  1. The expected (average) value of the error term is 0 at each value of the predictors. If x = 1, then the average residual for all points where x = 1, the average error would equal to zero.
  2. Each predictors is uncorrelated with the error term.
  3. The variance of the error term is constant (Homoscedasticity) at each value of the predictors.
  4. Error terms for different observations are uncorrelated (Independence of observations).
56
Q

Why do we calculate F-tests?

What is the formula of doing the F-test of R²?

What is this indicating conceptually?

Why is this formula important to know?

A

The significance of R2 can be tested using an F-ratio.

F = R²/k over (1-R²)/ (n-k-1).

We take our model R² over the variability left-over

This formula allows us to get the F-value without computing the SS or variance.

57
Q

Rather than looking at Sums of squares and computing variances, what can we do to obtain the F-value?

A

We can take the proportions of variability explained by the model and the proportions of variability unexplained by the model, both per degrees of freedom, to produce the F value.

58
Q

Why is meeting the Gauss-Markov assumptions important?

A

It is important to meet the 7 assumptions to run regressions.

59
Q

What is the assumption of homoscedasticity?

What would homo and heteroscedasticity look like on a scatterplot?

A

It’s that our error term is constant at each value of our predictor.

Homoscedasticity refers to the assumption that that the dependent variable (y) exhibits similar amounts of variance across the range of values for an independent (x) variable.

On a scatterplot, observations that are shaped as megaphones are heteroscedastic- we want an even distribution at the point of X.

60
Q

If all our of Gauss-Marcov Assumptions are met, what does this mean?

A
It means that OLS is Blue!
B - best
L - linear
U - unbiased
E - estimator

This means that the distribution of my parameters is unbiased and the mean is in the center of the distribution.

61
Q

What is the final assumption of normality?

Why do we need normality?

A
  1. At each value of the IVs, the errors are normally distributed (skews, platykurtic, bi-modal).

This CAN be violated but still NOT violate Gauss-Marcov assumptions.

However, we need normality because to test my coefficients, we used a one-sample t-test… and one of the assumptions of a one sample t-test is normality. The standard error will be incorrect due to bias, therefore, the t-value will be incorrect.

62
Q

What does ‘model’ in an equation get replaced by?

A

It gets replaced by a best fit that define the line better than the model.

63
Q

According to the book, a simple way to look at the residual term is:

A

The difference between the score predicted by the line and the score that the participant actually obtained.

64
Q

Residuals are synonymous with _______.

A

Residuals are synonymous with ‘deviations’.

Residuals are technically the deviations from the line.

65
Q

What is goodness of fit, conceptually?

A

How well a model that is generated fits the data - based on how well the data predicted by the model actually corresponds to the data that’s collected.

We still need to asses this model to make sure this is the best one for the data.

66
Q

Conceptually, what does the value of SSresidual represent?

Conceptually, what does the value of SStotal represent?

Together, what do they calculate conceptually?

A

It represents the degree of inaccuracy when the best model is fitted to the data.

SStotal represents the difference between the mean of the observed Ys and the predicted values.

Together they are used to calculate how much better the regression line (line of best fit) is better than the mean, or R-squared.

67
Q

Conceptually, what is F-ratio?

A

It is the measure of how much the model has improved the prediction of the outcome compared to the level of inaccuracy of the model.

So MSreg (improvement) ÷ MSres (errors and inaccuracies).

68
Q

Conceptually, why is predicting an outcome to a mean a bad idea?

A

The line representing the mean is flat - so as predictor values change, the value of the outcome don’t change.

*thinking back to author predicting $1 and $100,000 difference in advertising - use of mean would say that they would sell $200,000 either way…

69
Q

If R produces b0, or the intercept estimate as 134.1, what is it telling us when we are trying to predict album sales? (*in terms of dollars - round everything to thousands)

A

That when no money is spent on advertising (when X = 0), the model predicts that 134,100 albums will be sold.

70
Q

If R produces b1, or the slope estimate of 0.096 in predicting album sales, what does this mean? (*in terms of dollars - round everything to thousands)

A

That for an increase of $1,000, the model predicts 96 extra album sales.

71
Q

What does it mean when assumptions are met?

A

When all assumptions are met, we can apply the model we get from sample to population because the coefficients and parameters of that regression is unbiased.

72
Q

What are the 3 assumptions related to Variables in the Glauss-Marcov assumptions for regression?

A
  1. All predictors must be quantitative or dichotomous. All criterion must be quantitative, continuous, and unbounded. All without Error.
  2. All predictors must have non-zero variance. A zero variance indicates that the predictor is constant, and there won’t be a line and it would be undefined.
  3. No Multicollinearity - No correlation between predictors.
73
Q

What are the 4 assumptions related to Error in the Glauss-Marcov assumptions for regression?

A
  1. The expected value of the Error term is zero at each value of the predictor. So if X = 1, then the average residual for all points where X = 1 means the average error would equal to zero.
  2. Predictors are uncorrelated with the error term.
  3. Variance of the Error Term is constant (HOMOSCEDASTICITY) at each value of predictor.
  4. Error terms for different observations are Uncorrelated (Independence of observation).
74
Q

What are the criterions to be minimized in the OLS?

A

We are MINIMIZING the sum of the squared residuals.

The properties of regression equations based on OLS solutions being true:
1. Sum of residuals = 0 (y- ŷ = 0)
2. Sum of the squared residuals is at a minimum (y- ŷ squared)
3. Sum of observed values equals sum of the fitted values (y-bar = ŷ )
The sum of the observed values are equal to the fitted or predicted value.
4. The regression line always goes through the point (X,Y)
5. Residuals are uncorrelated with predictor - the relationship between x and y is uncorrelated.
6. The fitted Y value is less extreme on Y than the associated X value is on X (this property is called Regression towards the mean).

75
Q

What does it mean when the Gauss-Marcov assumptions are met?

A

It signifies that the distribution of the parameters are UNBIASED and the mean is in the center of the distribution.

76
Q

What do we get when we add the assumption of normally distributed errors?

A

It allows you to have accurate standard errors which allow us to do t-tests properly.

77
Q

What is the Variance of the Estimate and what other names do we know it by?

A

Variance of the Estimate AKAK MS Residual.

Variance of the Estimate measures the Residual stuff that can’t be explained by our Regression.

MSReg ÷ MSRes

78
Q

How is the Variance of the estimate used to calculate other measures in regression analysis?

A

It is the denominator in our F-test.

79
Q

What two things can we do once we have standard errors for b’s?

Know how to do them.

A

We can run a t-test to see the significance of the Intercept and coefficient.

80
Q

What is the Coefficient of Determination? How do we calculate it from elements in the ANOVA table?

How do we test and interpret it?

A

It’s R-squared, and we calculate R-squared by SSRes ÷ SSTotal

It explains the variability of X that’s attributed to the Criterion.

81
Q

Given output from an SLR, interpret the following:

R-squared
b0
b1
Plotting the regression line
Calculate the predicted and residual values for given X scores
A

R-squared -> The variability of Y that’s explained by X

b0 ->Value of Y when X is 0

b1->Change is Y when X increases by 1 unit

To plot the regression line, we need to plot the predicted values

To calculate residual, we do y-hat - y.