Linear Regression Flashcards

1
Q

Regression Analysis uses a _______model to predict a ______variable (dv) by using one or more _______variables (iv).

A

Statistical

Response

Predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In regression analysis, β0 and β1 are called_______

A

parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the four steps of hypothesis testing?

A

Step 1:

one-sided: H0<μ Ha≥μ(no linear association between x and y – not useful for predicting y)

two-sided: H0=μ Ha≠μ

Step 2:

t=(x ̅-μ0)/(s⁄ √n) with df=n-1
t*=b1/s{b}

Step 3: t {1- α, n-1} OR t {1- α/2, n-1}

Step 4: If t ≥ +crit val or ≤ -crit val reject H0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the simple linear regression model?

A

Y=β01X1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In linear regression, E(ε)=

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In linear regression, σ2 {ε}=

A

σ2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In linear regression, ε’s are/are not correlated and have covariance of ___.

A

ε’s are uncorrelated and have covariance of 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Least Squares Estimates of betas _____ the sum

n

                              ∑   [y<sub>1</sub>-(β<sub>0</sub>-β<sub>1</sub>x<sub>i</sub>)]<sup>2</sup>

                            (i=1)
A

minimize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interpretation of β1

Y=β01X1

A

For each increase in x, there is an increase/decrease in y.

(e.g., For each add’l hour a student watches tv, he loses .2 GPA points)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Interpretation of β0

Y=β01X1

A

The mean when x=0

(e.g., On average, first year students who don’t watch tv have a GPA of 3.9)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

y ̂ is the ____ regression line.

A

estimated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

b1 and b0 are estimates for

A

β1 and β0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the equation for b1

A

(ssxy)/(ssxx)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the equation for b0

A

y ̅ - b1x ̅

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the equation for SSxx

A

All of the following equations are equal

∑(xi - x ̅ )2

(∑x i2) - n(x ̅ )2

(n-1) sx2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

SSxx must be positive/negative.

A

positive

17
Q

What is the equation for SSxy

A

∑(xi - x ̅ ) (y - y ̅ )

(∑xiyi) - n(xy̅)

18
Q

When creating a table for an estimated regression line, which 5 columns should you include?

A

xi | yi | xi2 | yi2 | xiyi

19
Q

What is the equation for the residual εi

A

εi = yi-E(yi)

20
Q

What is the equation for the residual ei

A

ei = yi - y ̂i

21
Q

s2 is the ________

A

sample variance

22
Q

What is the equation for s2

A

All of the equations below are equal

(∑(xi-x ̅ )2) / (n-1)

SSE/(n-2)

MSE

23
Q

s is the ____________

A

sample standard deviation

24
Q

What is the equation for s

A

√MSE

√(SSE/(n-2))

25
Q

SSE is

A

The sum of the squared errors

26
Q

What is the equation for SSE

A

All of the equations below are equal

∑ei2

∑(yi - y ̂i)2

ssyy - b12ssxx

27
Q

What does s2=.045 and s= .212 mean?

A

If the dist of GPA for ppl who watch x hrs of tv is approx. normal, then about 95% of them are expected to have GPAs within 2(.212) units of their simple linear reg model

28
Q

You should assume ____ for hypothesis testing and confidence intervals

A

normality

29
Q

b1 and b0 are _______ for β1 and β0

A

least squares estimators

30
Q

Why do you want to have a large range of data?

A

The more variation you have, the better estimate of the slope you can get..

31
Q

sampling distribution of __(b1)_need to check this_?

A

has a t-distribution of n-2,

because we estimate b0 and b1

32
Q

What does it mean to have a 95% CI?

A

If we took 100 samples of size xx, we would expect 95% of tem to contain value β1

Interpretation: 95% of all b1’s will fall within this range

33
Q

What is Interval Estimation?

A

CI for mean of Y when x=xh

34
Q

SSTo

A

the error/variation when not using any model at all; never changes when using a diff model or using new variables; total var around y ̅

35
Q

SSE

A

error/variation when using SLR; the variation in y not explained by using x; too high equals too much error

36
Q

SSR

A

The error left after fitting the model; the chunk of variation in y explained by using x (we want this to be large)

37
Q

What are the components of the ANOVA table?

A

Source of Variation SS df MS
Regression SSR ÷ 1 = MSR
Error SSE _ ÷ _ n-2 = MSE
Total SSTo n-1

38
Q

What does an F-test for model usefulness tell us

A

if R2 is signif, but not if it is useful

39
Q

What are the four steps in conducting an F-test

A

Step 1:

two-sided: H01=0 Ha: β1 ≠ 0

Step 2: F*=MSR/MSE =SSR/MSE (all are always positive; want F* to be >1)

Step 3: F {1-α, 1, n-2} (*numerator df always 1 in SLR)

Step 4: if F* > F {1-α, 1, n-2}, we reject H0and we have evidence that the SLR model is useful

  • ****in SLR (one predictor variable), the t-test for β1=0 is the same as the F-test
  • ****In SLR only F* = t*2 √(fcrit) = tcrit