Quant Flashcards
What is linear regression?
Finding the relationship between 2 variables for predictive analysis
What is the SSE, SSR and SST
On a slope, one must determine the error between the line of best fit and the data points. These 3 varibles quantify that
SSR is the pRedicted deviation - its is the difference between the line of best fit and the mean of the data set
SSE is the ERROR deviation and is the difference between the line of best fit and the data point
SST is the sum of SSE and SSR - it shows the total deviation from the mean to the data point
Remember these are all SQUARED
What is the formula for r squared
R squared = SSR / SST
It shows how well explained/predictive the model is
Is a high or low r squared meaning the relationship is greater?
High r squared means HIGH relationship
R squred, what are the highest and lowest numbers it could be
It is between 0 and 1
What is the degrees of freedom
Degrees of freedom are the number of variables you have in the model minus how many variables you have minus 1.
You want the degrees of freedom high to have a good mdoel
DF = n-k-1
As Degrees of freedom increases, R squared ________? and why
As Degrees of freedom increases, R squared decreases.
Think if you only had 2 data points, the r^2 (relationship) would be 1. Putting in more variables would DECREASE r^2.
Formula for Y / relationship between x and Y
Y = β0 + β1 x + error
When comparing y = beta 0 + beta 1 * x + e, which is the independant and dependant variable?
Y is dependant, x is independant
If the confidence interval rises (from 90- 99%) does the probability of rejecting the null hypothesis go up or down? WHy
The probabily will go….. down. The confidence interval will get wider (to ensure we are more confient we have the right number).
What is a t statistic? What is the formula?
A t statisitc test is checking whether a hypothesised number could be the actual statistic/value of a score based on a t score, standard error, and the score we know to be true.
So it is the score we know it true + and - the t score * standard error.
The t score is found using the degrees of freedom minus 2. Get the score from the t table.
What is the stnadard error
SD / square root n
OR
Epsilon (which is Y -β1 - β2) < the formula for Y in reverse.
(Epsilon squared / n-2) ^.5
How to find SSR
It is the line of best fit - mean
How to find SSE
Value - line of best fit
What is an f test?
It compares 2 data sets to check if they’re statistically consistent
Confidence interval formula
= mean +- t or z score * standard error
What are the z scores for 90,95 and 99% ?
- 64
- 96
- 68
Coefficient of determination is
r^2
What is correlation squared?
r^2
in the formula y = Y = β0 + β1 x + error , What is B0
β0 is the y intercept
Confidence interval explanation and formula
Mean + - t or z stat * standard error.
Check if the OTHER mean (be it the actual or standard mean) is within those boundaries
What is the p value,
The pathetic value, we want that low to reject the null
What are some key assumptions to simple linear regression
the relationship between x and y is linear
x is uncorrelated with the error terms
Sum of residuals = 0
there is a constant variance
Formula for standard deviation with Standard error
Square root of Standard error / n-1
Is variance the same as SST?
Yes
Formula for DOF
DOF = k+ (n-k-1)
MSR (mean squared regression) and MSE (mean squared Error) formulas
MSR = SSR / k MSE = SSE / n-k-1
What is MSR / MSE
F stat
Formula for standard error in regression
square root sse / n - k - 1
Correlation formula, then R squared formula
Cor = Cov / omega omega
R^2 = cor^2
F stat formula, what is means, and how to interperet it
F stat is testing if there is even a relationship between the y and x variables
It is MSR/MSE
Over 1 means that there is a relationship
Calcualte MSR and MSE
MSR = SSR / n-k-1 MSE = SSE/k
MSE/MSR = F
What does adjusted r^2 do
It adjusts the r^2 so that increasing the dof does NOT increase the r^2
Downfall of R^2?
It is not bound by 0 and 1
What is a dummy variable? And how to incorporate into formula?
Introducing a QUALatative variable. You give it a value of 1, and every alternative a value of 0. If it is months of the year, and you want only results collected in Jan, Jan has a value of 1, and the rest (minus one month) have a value of 0
What is heteroskatacity?
It is unequal variances. Pretty much that there is a relationship between the standard error and the variable’s variance. You don’t want that