Regression Flashcards
What’s Gaussian quadrature and what’s disadvantages
Fitting n data points to a polynomial of order n+1
This will fit every data set without any errors, bad as data normally includes errors/outliers. Takes a lot of time/power, often desire able to fit to polynomial with fewer parameters than the data.
What is the equation for the simple linear regression model
Y = ax + b + e
e error term (random variable, normally distributed with mean 0)
a slope
b intercept
How do we find the coefficients in the simple linear regression equation
Rearrange for error term and find the ESS (error sum of the squares- sum all of the error values squared) and then partially differentiate the ESS wrt the different variables and set = 0 and solve the linear equations that result.
What is the correlation coefficient, R
A measure of the strength of the linear relationship between two variables.
Close to 0 = uncorrelated
Positive = variables are positively correlated
Large = greater degree of correlation
For linear relationships it is equal to sq root of R^2
How to modify ESS to get RSS, regression sum of the squares
Change yi to y bar (mean y value)
ESS represents error of the regressions estimate around its the actual value
RSS represents the error of the regressions estimate around the mean
What is the TTS total sum of the squares
Sum (between i =1 & n): (yi - ymean)^2
Is equal to ESS + RSS
Represents the error of y around it’s mean
What is the R^2 statistic
The fraction of the total variance (TSS) accounted for by the regression
R^2 = RSS/TSS
Closer to 1 shows the estimated regression function fits the data better
Useful even when non linear model is used where as R statistic is only useful for describing strength of linear relationships.