3. Aug 24th Flashcards
Can the null hypothesis be true?
Yes, in a controlled manipulative study.
What are multiple types of regression tests an example of?
The general linear model
- The most important topic of this entire class
What is the simplest form of the general linear model?
Simple linear regression
- Continuous x, continuous y
What are the two purposes of regression?
- Fit a line to data
- Test if the slope of that line is significant
- – If p-value < 0.05
What are the three possible causes of a large p-value?
If the p-value > 0.05, we don’t know if:
A) Sample size is too small
B) Effect size is too small
C) Too much noise
The traditional equation for a line
y = mx + b
b is the y intercept
m is the slope (change in y value/change in x value)
The stats equation for a line
General
Y = Beta-0 + Beta-1(X)
B0 is a constant
B1 is regression coefficient
Specific
y-hat = Beta-0 + Beta-1(X)
y-hat = predicted value of dependent variable
What two elements make up the relationship between every x and y (i.e. the line)?
Our equation PLUS Epsilon
E = error (normally distributed)
What two requirements must be met to be a “best fit line”?
- Average error = 0
- – (The distance from each point to the line added up)/n = 0 - The sum of squared error is minimized
- – Squaring to get rid of negatives
- – Complicated models are done iteratively
What is the more technical name for “total variance in y”?
Total Sum of Squares
Total Sums of Squares (SST): summed square of distance from data to null hypothesis line (0 slope)
Represents ALL the information we have
SST = SSR + SSE
Sigma(n top, i=1 bottom) (Yi - y-bar)^2
Yi - any individual Y (observed dependent variable)
Y-bar- average/mean y
What 2 things does the total sum of squares partition variation into?
https://365datascience.com/sum-squares/
- SSR- sum of squares due to regression
— The variation in y due to variation in x
Sigma(n top, i=1 bottom) (Yi-hat - y-bar)^2
Yi-hat = Your predicted value of y
Y-bar = average/mean of Ys AKA mean of the dependent variable - SSE- sum of squares due to error
— Noise
Sigma(n top, i=1 bottom) (ei^2)
e = difference between the observed value and the predicted value
What type of p-value will you get if sum of square error (SSE) is larger than sum of squares regression (SSR)?
A large p value (greater than 0.05)
What type of p-value will you get if sum of squares regression (SSR) is greater than sum of squares error (SSE)?
A smaller p-value (smaller than 0.05)
Also means that movement in y is mostly due to x
Important take aways of regression
1) Best fit lines mean
- –a) Average error = 0
- –b) Minimize sum of squares error (SSE)
2) P-values are calculated by partioning variation in y into
- – Sum of squares regression (SSR)
- – Sum of squares error (SSE)
What does regression display?
Correlation
NOT causation