Normal Linear Models Flashcards
Linear Regression
y
- dependent/response variable
- assume y is normally distributed for linear regression
Linear Regression
x
x = (x1,…,xp)
- where p is the number of covariates
- independent variables / covariates / predictor variables
Linear Regression
Model
y = α + Σ βj xj
-sum from j = 1 to j=p, where p is the number of predictor variables
Linear Regression
Residual Sum of Squares
R = Σ (yi - μi^)²
- sum from i = 1 to n where n is the number of observations
- and μi^ is the fitted value for yi
Linear Regression
Compare Models With F-Statistic
-compare model 0 with model 1
-null hypothesis: model 0 is best
-alternative: model 1 is best
F01 = (Ro-R1)/(r0-r1) / R1/r1
-where:
r = residual degrees of freedom
R = residual sum of squares
Logistic Function
logistic(x) = 1 / (1 + exp(-x))
Logit Function
-inverse of the logistic function
logit(q) = log(q / 1-q)
Types of Variable
- quantitative
- qualitative
Types of Variable
Quantitative
- continuous
- count
Types of Variable
Qualitative
- un-ordered categorical
- -dichotomous (two categories)
- -polytomous (more than two categories)
-ordered categorical
Types of Normal Linear Model
Quantitative Explanatory Variable, p=1
-simple linear regression
y = α + βx1 + ε
Types of Normal Linear Model
Quantitative Explanatory Variable, p>1
-multiple linear regression
y = α + Σ βixi + ε
-sum from i=1 to i=p
Types of Normal Linear Model
Dichotomous Explanatory Variable, p=1
-two sample t-test
-dichotomous: x=1 or 2
y = α + γ I(x=2) + ε
-where, I(x=j) = { 1 if x=j, or 0 else}
Types of Normal Linear Model
Polytomous Explanatory Variable, p=1
-one-way anova
-polytomous: x=1,…,k
y = α + Σ δj I(x=j) + ε
-sum from j=1 to j=k
Matrix Representations of Normal Linear Models
Y = XΒ + E
-where Y is an nx1 vector of observations, X is an nxp ‘design matrix’, B is a px1 vector of parameters and E is an nx1 vector of errors
Constructing the Design Matrix
1) first column is a vector of 1s (intercept)
2) for each explanatory variable:
- -if quantitative: add xi as a column
- -if qualitative: add k dummy columns taking values 0 or 1, then remove one of these columns
3) for interaction terms e.g. for x1*x2, add a column of x1 values multiplied by x2 values
Notation for Models
~
~ = modelled by / regressed by
Maximum Likelihoos Estimation
- the likelihood is equal to the probability density function, f(y)
- take ln to get log-likelihood
- differentiate with respect to parameter and set equal to zero
Normal Distribution
Probability Density Function
f(y) = 1/σ√2π * exp{-1/2 * [(x-μ)/σ]²}
Poisson Distribution
Probability Mass Function
f(y) = λ^y * exp(-λ) / k!
Binomial Distribution
Probability Mass Function
f(y) = mCy * p^y * [1-p]^(m-y)
What is the purpose of a generalised linear model?
-to model the dependence of a dependent variable, y, on a set of p explanatory variables, x=(x1,…,xp), where ,conditionally on x, observation y has a distribution which is not necessarily normal
Normal Linear Model
Definition
-a model that assumes the distribution of the dependent variable is Gaussian