LEC 11a Simple Linear Regression Flashcards
How to decide on appropriate statistical test for regression?
Depends on the type of dependent variable
- Continuous variable
- linear regression - Ordinal variable
- ordinal regression - Nominal variable
- logistic regression
Simple vs Multiple regression
Applies to continuous, ordinal and nominal variables
Simple regression
- only 1 independent variable
Multiple regression
- more than 1 independent variables
Correlation vs Simple linear regression (2)
- definition
- symmetry
Correlation
- quantifies the degree to which 2 random variables are related, provided that the relationship is linear
- makes no distinction between the 2 variables (symmetrical)
Simple linear regression
- determines the best-fitting straight line for a dataset to investigate the change in 1 variable (dependent variable Y) that corresponds to a given change in the other variable (independent variable X), provided that there is significant correlation
- X and Y are asymmetrical
Applications of simple linear regression (2)
- Describe the linear relationship between the 2 variables
2. Predict or estimate the value of Y associated with a fixed value of X
Can extrapolate values of Y beyond the observed range?
Cannot extrapolate beyond the observed range as the relationship between X and Y may not be linear
Simple linear regression model
Y = alpha + beta(X)
alpha = y-intercept beta = slope
Alpha meaning
Mean value of Y when X=0
Beta meaning
The change in the mean value of Y that corresponds to a one-unit change in X
Does linear regression test for linear relationship between the 2 variables?
No.
- it assumes linear relationship
- finds the best-fitting straight line with the y-intercept and slope
Hence, always plot scatter plot to determine if there is any linear relationship
Linear relationship : linear regression
Non-linear relationship : non-linear regression
Scatter plot to determine use of linear regression
Scatter plot must suggest :
- Linear relationship
- Significant correlation
Assumptions of simple linear regression model (4)
- There is linear relationship between the variables
- Each observations are independent of one another
- For any specified values of X, the distribution of the Y values is normal
- For any set of values of X, the variance is constant (equal variance)
How to determine the best-fitting straight line?
Method of Least Squares
- best-fitting line = line with the smallest residual sum of squares
Residual plot
Residual against Y values
Each residual data is randomly scattered above and below ei=0
Test statistics for beta (slope)
Ho & H1
Two-tailed test
Ho :
- there is no effect of the independent variable X on the dependent variable Y
- beta = 0
- equivalent to testing correlation = 0
H1 :
- there is an effect of the independent variable X on the dependent variable Y
- beta =/ 0
Test statistics for alpha (constant)
Seldom done cos not really important
Two-tailed test
Ho :
- alpha = 0
H1 :
- alpha =/ 0