Lecture 9 - Regression Depression Flashcards by Cassie Bosma

Why is regression analysis used so often?

It models a predictor for the future, or further along the gradients measured?

How well did you know this?

Not at all

Perfectly

How is the independence assumption evaluated?

By plotting the residuals against increasing observed values (fitted)
occurs when the residuals fluctuate uniformly about 0 (no pattern)
ie. no unbalanced + or - groups

How well did you know this?

Not at all

Perfectly

When does a lack of independence with residuals occur?

When adjacent residuals tend to be similar and thus appear to be correlated = autocorrelation
ie. groupings that consistently fall below or above the line

How well did you know this?

Not at all

Perfectly

What is positive autocorrelation (dependence?)?

Positive residuals are followed by other positive individuals

How well did you know this?

Not at all

Perfectly

What is negative autocorrelation?

negative residuals are followed by other negative residuals

How well did you know this?

Not at all

Perfectly

What is the purpose of regression depression?

To examine linearity between two variables

Determine if a linear relationship exists between two variables

How well did you know this?

Not at all

Perfectly

If the R2 value is good, what might still lead us to be concerned with what a scatter plot depicts?

If the scatterplot shows a plateau

Could mean constraints on data

How well did you know this?

Not at all

Perfectly

Do we want to fit a curve to the data (polynomial)?

Not desirable to curve data and fit a polynomial because it gets complicated and is not very suitable for biology
Sometimes we might transform the data in this situation

How well did you know this?

Not at all

Perfectly

What is deriving R2 and the decomposition of variability?

a least square analysis to fit the best line that reduces the variability around the dependent response variable (y)

How well did you know this?

Not at all

Perfectly

What is the Y-bar?

Mean of all x and y values (pivots variability around dependent response?)

How well did you know this?

Not at all

Perfectly

What is (B) in the decomposition of variability?

linear distance from observed to expected = residual or error or unexplained term
(yi-y-hat-i)

How well did you know this?

Not at all

Perfectly

What is yi?

the observed value

How well did you know this?

Not at all

Perfectly

What is y-hat-i?

the expected value

How well did you know this?

Not at all

Perfectly

What is (C) in the decomposition of variability?

The Model (y-hat-i - Y-bar) and is linear distance from the expected to the mean of all x and y values (Y-bar)
=model or regression #

How well did you know this?

Not at all

Perfectly

What is (A) in the decomposition of variability?

The two components of (B) and (C)

where (A) = (B)+(C) to = the total variability

How well did you know this?

Not at all

Perfectly

Total variation equation = ???

Study These Flashcards

Total variation (A) = Residual (error/unexplained/B) + Model (explained/regression#/C)

Why do we sum the squares of the residuals?

Study These Flashcards

To get rid of negative differences from the difference between the expected yi and the Y-bar mean of the x and y’s
Negatives would make it add up to 0 or some other weird number

What is the equation for the Total residuals (SSt)? And how does this relate to the A B C?

Study These Flashcards

Total SSt = Residual SSe (B) + Model SSm (C)

What is SSe a measure of?

Study These Flashcards

SSe is a measure of how well the regression line fits the actual data (difference between observed and expected values)

What is SSm a measure of?

Study These Flashcards

SSm is a measure of how different the line y-hat-i is from Y-bar (how different is the slope from 0)

What is the equation for the Coefficient of determination R2?

Study These Flashcards

R2 = SSm (C)/ SSt (A, total)

What is the Model referring to in an ANOVA table?

Study These Flashcards

The treatment/factor

How do you determine F with the ANOVA?

Study These Flashcards

F=MSm/MSE

or F=MSm/MSres (same thing, different label)

What is the equation for the sum of squares Model (SSm)?

Study These Flashcards

Sum of (Y-bar - y-hat-i)2

What is the equation for the sum of squares Residual (SSres or SSe)?

Sum of (yi - y-hat-i)2

What is the equation for the sum of squares total (SSt)?

Sum of (yi - Y-bar)2

What is the degree of freedom for the Residual/Error?

n-2

What is the degree of freedom for the Total?

n-1

How do you determine the MS (Mean Square)?

The sum of squares divided by the degrees of freedom | SS/df

How do you calculated F?

MSmodel/MSresidual

What is the H(0): for the ANOVA table for simple linear regression?

H(0): is that there is no relationship between y and x | Cannot use x to predict y

We can usually expect the slope and intercept to not exactly equal 0, so why do we test these hypothesis anyways?

We test the hypothesis anyways to see if the difference from 0 is significant

What are the 3 null hypothesis of the Regression coefficients?

``` 1) The intercept is no different from 0 intercept = constant 2) The slope is no different from 0 x = slope? (in example weight=slope) 3) There is no relationship between the response and the predictor combination of the other tests? ```

There are 2 t-tests in the regression coefficients. Why?

Because they test each coefficient and their hypothesis separetly

What is done to test the assumption of linearity?

a scatterplot with a linear regression line and 95% confidence intervals plotted If the data points are weakly scattered about the regression line then a linear regression may not be appropriate

How do you fix the linearity assumption if the points are weakly scattered about the regression line?

Plot a curvilinear relationship (plateau problem?)

What is done to test the assumption of normality?

A normal Q-Q plot is plotted and if the residuals come from a normal distribution the standard residuals should appear to fall on the line of the plot. If they skew off on the sides, that could indicate tails. ie. the plot should track the straight line

What is the data ordered like in a Residual Diagnostics plot (4 in 1)?

Ordered smallest to largest and standardized to 0 for itself and the relative variability must be the same (no patterns)

Lecture 9 - Regression Depression Flashcards

(38 cards)