Statistic Variables Flashcards
What is the formula of RSS and what is its purpose?
RSS: Residual Sum of Squares
Formula: sum of (yi - yi_estimated)**2 with i=1..n
Purpose: amount of variability that is left unexplained after performing the fit
What is the standard error of an estimated variable?
It is the average amount that this estimate differs from the actual value of the variable
What is a 95% confidence interval for estimated variable µ? (formula and meaning)
Formula: [µ - 2SE(µ) ; µ + 2SE(µ)]
Meaning: the true value of µ has 95% chance of being in this interval
What is the t-statistic of an estimated variable µ? (formula and meaning)
Formula: t_statistic = (µ - 0)/SE(µ)
Meaning: number of standard deviations that µ is away from 0
What does a small p-value indicate? What is a small enough p-value?
A small p-value indicates that, in the absence of any real association between a predictor and a response, it is unlikely to observe such a substantial association due to chance.
A p-value under 5% usually justifies the rejection of the null hypothesis.
What is the RSE? (formula and meaning)
RSE: Residual Standard Error
Formula: sqrt(sum over i=1..n of (yi-yi_estimated)**2/(n-2)) = sqrt(RSS/(n-2))
Meaning: average amount that a prediciton will deviate from the true regression line
What it the R**2-statistic? (formula and meaning)
In a simple linear regression setting, what is an equivalent?
Formula: 1 - RSS/TSS
Meaning: the proportion of variance explained
Equivalent: R2 = r2 = Cor(X,Y)
What is the TSS? (formula and meaning)
TSS: Total Sum of Squares
Formula: sum((yi - y_mean)**2)
Meaning: the amount of variability inherent in the response before the regression is performed
What is the F-statistic? (formula and meaning)
What does the value indicate? What does the interpretation of the value depend upon?
Formula: ((TSS-RSS)/p) / (RSS/(n-p-1))
Meaning: show the strength of the relationship between the response and the predictors
Values:
- close to 1: no relationship
- superior to 1: some evidence of a relationship
Interpretation: it depends on the value of n and p. If n is large, a value > 1 but close to one might still indicate evidence of a relationship
Why do we look at F-statistic and not simply all p-values?
Because the number of predictors has an influence. The more predictors, the more chance that we will incorrectly conclude that there is a relationship because statistically, some p-values will be under 5% while they shouldn’t.