Stats 2 - Stats & Linear Models Flashcards
What are the componenets of a Linear Model?
What are the types of variables you can have in a Linear Model?
How many Co-efficients do Continious terms and categorical factors have in a Linear Model?
Continuous terms always have one coefficient (β2)
Categorical Factors have N − 1 coefficients, where N is the number of levels in the category
Why N-1? Why are we missing a level?
The missing variable is incorporate into the baseline/refrence value known as the Intercept (β1) –> the level chosen as the refrence is done alphabetically
What makes a Linear model Linear?
Linear models are just a sum of terms that are linear in the coefficients –> Coefficients of each term are simple
Examples of types of Linear models?
The response variable is always Continuous
- Simple Regression –> Continuous Explanatory vairiables
- Multiple Linear Regression –> Continuous/Categorical explanatory variables
- ANOVA –> Categorical Explanatory Variable
- ANCOVA –> Categorical/Continuous explanatory variables
Note –> MLR and ANCOVA are very similar just place emphasis on one type of variable or the other
How can we decide what the best fit for our Linear Model is?
Solution Least Squares Fitting Solution
Given the data collected, we need to figure out a Linear model that best represents the data. This is done by…
- minimizing the sum of the distance (vertical distance between the data points and the line.
This is called the ‘Least squared solution’ which is where we minimize sum squared of ‘R’ –> we are squaring because we do not want negative values.
Plots below shows you the changes in resvar (Sum of squared R) when changing the coefficient –> The minimum corresponds to the solution.
How can we denote the Lienar model with least sqaures fitting solution?
Y with a hat
What are the assumptions of a Linear Model?
Outline the role of each of the diagnostic plots
Code
par(mfrow = c(2, 2), mar = c(5, 5, 1.5, 1.5))
plot(Model Name)
What are the two main tests performed in order to check whether your model realy explains the data? How do you know if we can rely on it?
F-Test –> Tests how much variation is explained
T-Test –> Tests the significane of the estimated Co-efficients
Outline the mean of the terms TSS, ESS and RSS?
TSS - Differences between the observed dependent variable (y) and the mean of y –> tells you the spread around the mean
ESS – differences between predicted y values and the mean of Y –> Looks at the spread around the mean when using the linear model
We expect and hope that it is smaller than TSS –> otherwise our model is shit
RSS –> sum of the squared differences between observed y and predicted y (model) –> basically the sum of the R2
RSS outlines how much variation our model cannot explain/take into account
What is the relationship between ESS, RSS and TSS?
TSS = ESS + RSS
How is the F-Statistic calculated?
Now that you have calculated the F-Statistic, how do you know whether it is significant?
How is the T-Statistic/Value Calculated?
Figure out the combinations of Effect size and precision for each plot.
Effect size –> is there a correlation between the two variables
Precision –> How close do the data points lie along the slope
Dotted lines represent confidence bounds visual representation of reliability –> bounds at the extremes are curve as extreme values can bias the slope more/more leverage
Hence, it is useful to sample at the extremes
Now that you have calculated the t-value for your co-efficients, how can you test whether they are statistically significant?
Generally speaking in statistics, what is a T-Test?
Why does using T-Tests not useful comparing more than 2 levels?
Formula to calculate the T-value?