LEC 11a Simple Linear Regression Flashcards
How to decide on appropriate statistical test for regression?
Depends on the type of dependent variable
- Continuous variable
- linear regression - Ordinal variable
- ordinal regression - Nominal variable
- logistic regression
Simple vs Multiple regression
Applies to continuous, ordinal and nominal variables
Simple regression
- only 1 independent variable
Multiple regression
- more than 1 independent variables
Correlation vs Simple linear regression (2)
- definition
- symmetry
Correlation
- quantifies the degree to which 2 random variables are related, provided that the relationship is linear
- makes no distinction between the 2 variables (symmetrical)
Simple linear regression
- determines the best-fitting straight line for a dataset to investigate the change in 1 variable (dependent variable Y) that corresponds to a given change in the other variable (independent variable X), provided that there is significant correlation
- X and Y are asymmetrical
Applications of simple linear regression (2)
- Describe the linear relationship between the 2 variables
2. Predict or estimate the value of Y associated with a fixed value of X
Can extrapolate values of Y beyond the observed range?
Cannot extrapolate beyond the observed range as the relationship between X and Y may not be linear
Simple linear regression model
Y = alpha + beta(X)
alpha = y-intercept beta = slope
Alpha meaning
Mean value of Y when X=0
Beta meaning
The change in the mean value of Y that corresponds to a one-unit change in X
Does linear regression test for linear relationship between the 2 variables?
No.
- it assumes linear relationship
- finds the best-fitting straight line with the y-intercept and slope
Hence, always plot scatter plot to determine if there is any linear relationship
Linear relationship : linear regression
Non-linear relationship : non-linear regression
Scatter plot to determine use of linear regression
Scatter plot must suggest :
- Linear relationship
- Significant correlation
Assumptions of simple linear regression model (4)
- There is linear relationship between the variables
- Each observations are independent of one another
- For any specified values of X, the distribution of the Y values is normal
- For any set of values of X, the variance is constant (equal variance)
How to determine the best-fitting straight line?
Method of Least Squares
- best-fitting line = line with the smallest residual sum of squares
Residual plot
Residual against Y values
Each residual data is randomly scattered above and below ei=0
Test statistics for beta (slope)
Ho & H1
Two-tailed test
Ho :
- there is no effect of the independent variable X on the dependent variable Y
- beta = 0
- equivalent to testing correlation = 0
H1 :
- there is an effect of the independent variable X on the dependent variable Y
- beta =/ 0
Test statistics for alpha (constant)
Seldom done cos not really important
Two-tailed test
Ho :
- alpha = 0
H1 :
- alpha =/ 0
Evaluation of the goodness-of-fit regression model
Coefficient of determination (R^2)
In simple linear regression, R^2 = r^2,
r = Pearson product-moment correlation coefficient
R^2
- meaning
- range
- the proportion of the variability among the observed values of Y that is explained by the linear regression of Y on X
- range : 0 =< R^2 =< 1
R^2 = 1
All data points lie exactly on the best-fitting line
R^2 = 0
There is no linear relationship between X and Y
R^2
Coefficient of determination
Significance level for constant value in statistical report
Not important
Significance level for ANOVA report (2)
- p-value for overall significance of regression model
- same significance level value for Coefficients report
Sum of squares (regression) in ANOVA report
- variability in Y that is explained by the regression model
Sum of squares (residual) in ANOVA report
- variability in Y that is unexplained by the regression model
Sum of squares (total) in ANOVA report
- total variability in Y
Steps to analyse linear regression (4)
- Check if assumptions of linear regression fulfilled
- independent observations
- for each set of X values, there is equal variance
- for each set of X values, the distribution of Y values is normal
- linear relationship between both variables - Scatter plot to determine linear relationship
- Correlation
- Pearson Product Moment Correlation
- Spearman Rank Correlation
- must show significant correlation (proceed to step 4) - Conduct linear regression analysis