Lecture 9 - Regression Depression Flashcards

1
Q

Why is regression analysis used so often?

A

It models a predictor for the future, or further along the gradients measured?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is the independence assumption evaluated?

A

By plotting the residuals against increasing observed values (fitted)
occurs when the residuals fluctuate uniformly about 0 (no pattern)
ie. no unbalanced + or - groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When does a lack of independence with residuals occur?

A

When adjacent residuals tend to be similar and thus appear to be correlated = autocorrelation
ie. groupings that consistently fall below or above the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is positive autocorrelation (dependence?)?

A

Positive residuals are followed by other positive individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is negative autocorrelation?

A

negative residuals are followed by other negative residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the purpose of regression depression?

A

To examine linearity between two variables

Determine if a linear relationship exists between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If the R2 value is good, what might still lead us to be concerned with what a scatter plot depicts?

A

If the scatterplot shows a plateau

Could mean constraints on data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Do we want to fit a curve to the data (polynomial)?

A

Not desirable to curve data and fit a polynomial because it gets complicated and is not very suitable for biology
Sometimes we might transform the data in this situation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is deriving R2 and the decomposition of variability?

A

a least square analysis to fit the best line that reduces the variability around the dependent response variable (y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Y-bar?

A

Mean of all x and y values (pivots variability around dependent response?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is (B) in the decomposition of variability?

A

linear distance from observed to expected = residual or error or unexplained term
(yi-y-hat-i)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is yi?

A

the observed value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is y-hat-i?

A

the expected value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is (C) in the decomposition of variability?

A
The Model (y-hat-i - Y-bar) and is linear distance from the expected to the mean of all x and y values (Y-bar)
=model or regression #
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is (A) in the decomposition of variability?

A

The two components of (B) and (C)

where (A) = (B)+(C) to = the total variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Total variation equation = ???

A

Total variation (A) = Residual (error/unexplained/B) + Model (explained/regression#/C)

17
Q

Why do we sum the squares of the residuals?

A

To get rid of negative differences from the difference between the expected yi and the Y-bar mean of the x and y’s
Negatives would make it add up to 0 or some other weird number

18
Q

What is the equation for the Total residuals (SSt)? And how does this relate to the A B C?

A

Total SSt = Residual SSe (B) + Model SSm (C)

19
Q

What is SSe a measure of?

A

SSe is a measure of how well the regression line fits the actual data (difference between observed and expected values)

20
Q

What is SSm a measure of?

A

SSm is a measure of how different the line y-hat-i is from Y-bar (how different is the slope from 0)

21
Q

What is the equation for the Coefficient of determination R2?

A

R2 = SSm (C)/ SSt (A, total)

22
Q

What is the Model referring to in an ANOVA table?

A

The treatment/factor

23
Q

How do you determine F with the ANOVA?

A

F=MSm/MSE

or F=MSm/MSres (same thing, different label)

24
Q

What is the equation for the sum of squares Model (SSm)?

A

Sum of (Y-bar - y-hat-i)2

25
Q

What is the equation for the sum of squares Residual (SSres or SSe)?

A

Sum of (yi - y-hat-i)2

26
Q

What is the equation for the sum of squares total (SSt)?

A

Sum of (yi - Y-bar)2

27
Q

What is the degree of freedom for the Residual/Error?

A

n-2

28
Q

What is the degree of freedom for the Total?

A

n-1

29
Q

How do you determine the MS (Mean Square)?

A

The sum of squares divided by the degrees of freedom

SS/df

30
Q

How do you calculated F?

A

MSmodel/MSresidual

31
Q

What is the H(0): for the ANOVA table for simple linear regression?

A

H(0): is that there is no relationship between y and x

Cannot use x to predict y

32
Q

We can usually expect the slope and intercept to not exactly equal 0, so why do we test these hypothesis anyways?

A

We test the hypothesis anyways to see if the difference from 0 is significant

33
Q

What are the 3 null hypothesis of the Regression coefficients?

A
1) The intercept is no different from 0
intercept = constant
2) The slope is no different from 0
x = slope? (in example weight=slope)
3) There is no relationship between the response and the predictor
combination of the other tests?
34
Q

There are 2 t-tests in the regression coefficients. Why?

A

Because they test each coefficient and their hypothesis separetly

35
Q

What is done to test the assumption of linearity?

A

a scatterplot with a linear regression line and 95% confidence intervals plotted
If the data points are weakly scattered about the regression line then a linear regression may not be appropriate

36
Q

How do you fix the linearity assumption if the points are weakly scattered about the regression line?

A

Plot a curvilinear relationship (plateau problem?)

37
Q

What is done to test the assumption of normality?

A

A normal Q-Q plot is plotted and if the residuals come from a normal distribution the standard residuals should appear to fall on the line of the plot. If they skew off on the sides, that could indicate tails.
ie. the plot should track the straight line

38
Q

What is the data ordered like in a Residual Diagnostics plot (4 in 1)?

A

Ordered smallest to largest and standardized to 0 for itself and the relative variability must be the same (no patterns)