chapter 13: the simple linear regression model Flashcards

1
Q

the simple linear regression model

A

the simple linear regression model assumes that the relationship between the dependent variable and independent variable can be approximated by a straight line

y: decedent variable
x: independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what can we use to tentatively decide wether there is an approximate straight line relationship between x and y

A

scatter plot

scatter diagram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the the simple linear regression model formula

A

y = B0 + B1x + E

contains the mean level Uy

the y intercept B0

the slope B1

the error term E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the mean level of the simple linear regression model formula?

A

Uy = B0 + B1x

the line of means

the values of y can be represented by the mean level

the value changes in the straight line represented by Uy

the y intercept: B0

the slope: B1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

the error term E

A

describes the effects on y of all factors other than the value of independent variable x

can be positive, negative or 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what does it mean for the error term E to be 0?

A

there is no difference between the mean level Uy and and just y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what does it mean for the error term E to be bigger than 0?

A

the point will be above what is should be according to the Uy = B0 + B1x

it will be above than the corresponding x value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what does it mean for the error term E to be lower than 0?

A

the point will be below what is should be according to the Uy = B0 + B1x

it will be lower than the corresponding x value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the impact of B1 (the slope)

A

if B1 is positive, the regression line will go up

if B1 is negative, the regression line will go down

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are the regression parameters

A

the y intercept B0

the slope B1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

true or false

we can reflect the changes made in the regression line as a change in the independent variable causing a change in the dependent variable

A

false

we can say the effect of the independent variable on the dependent variable

we can say that the two variables move together and that the independent variable contributes to information predicting the independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

the least square line

A

the best visual estimated regression line

y^ = b0 + b1x

y^: the predicted value of y

b0: point estimate of y intercept BO
b1: point estimate of slope of Uy B1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how is the predicted value of the dependent variable y found

A

yî = b0 + b1xi

b1 = SSxy / SSxx

b0 = y- - b1x-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the residual of an observation?

A

yi - y^

yi: the observed y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the experimental region

A

the range of previously observed population sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

the point prediction of an individual value

A

the point prediction of an individual value of the dependent variable when the value of the independent variable is X0

here we predict the error term to be 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

simple coefficient of determination

A

a measure of potential selfness in the simple linear regression model

r^2 (r squared)

explained variation / total variation

r^2 always bigger than 0, but never bigger than 1

the closer it is to 1, the larger the proportion of the total variation that is explained by the simple linear regression model, the greater it can predict y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

how do you calculate the error of prediction in the simple coefficient determination?

A

yi - y-

y- (mean y), only works if we are not considering changes to x

yi - y^ if we are considering the changes to x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is the total variation?

A

the sum of squared prediction errors

this quantity measures the Toal amount of variation exhibited by the observed values of y

explained variation + unexplained variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is the unexplained variation

A

another name of the SSE

the sum of squared prediction errors when we use the predictor variable x

quantity that measures the amount of variation in the values of y that is not explained by the predictor variable

total variation- explained variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

explained variation

A

total variation - unexplained variation

22
Q

what is the best way to get prediction accuracy

A

by calculating a prediction interval

23
Q

the simple correlation coefficient r

A

a measure of correlation and strength of linear relationship between x and y

r = +sqrt(r^2) if b1 is positive

r = -sqrt(r^2) if b1 is negative

24
Q

why can r be negative but not r^2

A

cause bruv, something ^2 can’t be negative

r though, it can be negative

stays between -1 and 1

25
Q

what does it mean for x and y to be highly related and positively correlated

A

r is near 1

26
Q

what does it mean for x and y to be highly related and negatively correlated

A

r is near -1

27
Q

what are the regression assumptions?

A
  1. at any given value of x, the population of potential error term values has a mean equal to 0
  2. there is a constant variance assumption

at any value of x, the population of potential error term values has a variance that does not depend on the value of x

error term values per x values have equal variances

  1. normality assumption
  2. independence assumption
28
Q

what does it mean for different populations of potential error term values per corresponding values of to have equal variances?

A

at any value of x, the population of potential error term values has a variance that does not depend on the value of x

29
Q

whats the normality assumption (3) of the regression assumptions?

A

at any given value x, the population of error term values has a normal distribution

30
Q

whats the independence assumption (4) of the regression assumptions?

A

any one value of the error term e is statistically independent of any other value of E

E of a certain y independent to any other E of another y

they don’t affect each other

31
Q

what do the overall regression assumptions mean?

A

for every value of x, the population of potential error term values is normally distributed

the mean of the population of error terms is 0

the variance does not depend on the value of x

32
Q

why do we predict the mean of the population of errors terms to be 0?

A

because it has a 50% chance of being positive, and 50% chance of being negative

33
Q

what is the mean square error?

A

the point estimate of the variance

34
Q

what is the standard error

A

the point estimate of the standard deviation

35
Q

which is the best line to observe data?

why?

A

the least square regression line

It is the line minimizing the sum of the squared residuals

36
Q

why is it dangerous to extrapolate out of the experimental region?

A

because we do not know that x and y have a linear relationship outside the experimental region

37
Q

explain what is the total variation, explained variation, and unexplained variation

A

The total variation is the sum of the squared prediction errors when we do not use the predictor x

The unexplained variation is the sum of the squared prediction errors when we do use the predictor x

The explained variation measures the improvement in the fit when we do use the predictor x

38
Q

when is a simple linear regression model useful?

A

when there is a significant relationship between x and y

39
Q

how do we test the si significance of the relationship between x and y

A

with the null hypothesis (h0) being B1 = 0

this says there is no change in the mean value of y associated with an increase of x

ha: B1 =/= 0

40
Q

if the regression assumptions hold, how many degrees of freedom does the t distribution have (cause you gotta use t distribution)

A

n - 2

41
Q

is there a difference between using a one sided or two sided critical value or p-value?

A

nah boy

42
Q

at what significance level do we see very strong evidence that the regression relationship is significant?

A

0.05 significance level

43
Q

at what significance level do we see strong evidence that the regression relationship is significant?

A

0.01 significance level

44
Q

how do we test the significance of the y intercept

A

H0: B0 = 0

Ha: B0 =/= 0

45
Q

how do we reject H0 in favor of Ha when testing the significance of the y intercept?

A

setting the probability of a type 1 error

46
Q

what is the f test

A

another way of checking the significance of the slope

checking if H0: B1 = 0

47
Q

how do you do the f test?

A

F = explained variation / (unexplained variation) / (n - 2))

48
Q

what is the difference between a confidence interval and a regression interval?

A

A confidence interval is intended to capture the mean value of y

based on standard error sy^

A prediction interval is intended to capture an individual observation of y

based on standard rarer s(y - y^)

49
Q

what does the distance value measure

A

The distance between xo and 𝑥-

xo: value of x corresponding to a a certain point estimate or point prediction

x-: mean of x values

50
Q

how does the distance value affect confidence intervals

A

the bigger the distance value, the larger the confidence interval