Lecture 9 - Regression Depression Flashcards
Why is regression analysis used so often?
It models a predictor for the future, or further along the gradients measured?
How is the independence assumption evaluated?
By plotting the residuals against increasing observed values (fitted)
occurs when the residuals fluctuate uniformly about 0 (no pattern)
ie. no unbalanced + or - groups
When does a lack of independence with residuals occur?
When adjacent residuals tend to be similar and thus appear to be correlated = autocorrelation
ie. groupings that consistently fall below or above the line
What is positive autocorrelation (dependence?)?
Positive residuals are followed by other positive individuals
What is negative autocorrelation?
negative residuals are followed by other negative residuals
What is the purpose of regression depression?
To examine linearity between two variables
Determine if a linear relationship exists between two variables
If the R2 value is good, what might still lead us to be concerned with what a scatter plot depicts?
If the scatterplot shows a plateau
Could mean constraints on data
Do we want to fit a curve to the data (polynomial)?
Not desirable to curve data and fit a polynomial because it gets complicated and is not very suitable for biology
Sometimes we might transform the data in this situation
What is deriving R2 and the decomposition of variability?
a least square analysis to fit the best line that reduces the variability around the dependent response variable (y)
What is the Y-bar?
Mean of all x and y values (pivots variability around dependent response?)
What is (B) in the decomposition of variability?
linear distance from observed to expected = residual or error or unexplained term
(yi-y-hat-i)
What is yi?
the observed value
What is y-hat-i?
the expected value
What is (C) in the decomposition of variability?
The Model (y-hat-i - Y-bar) and is linear distance from the expected to the mean of all x and y values (Y-bar) =model or regression #
What is (A) in the decomposition of variability?
The two components of (B) and (C)
where (A) = (B)+(C) to = the total variability