Simple Linear Correlation and Regression Flashcards
Regression Analysis
- Regression Analysis is a way of estimating the relationship between different variables by examining the behavior of the system
- There are many techniques for modeling and analyzing the dependent and independent variables
- You are basically trying to derive an equation from the graph of your data.
Linear Regression Analysis
The easiest kind of regression is linear regression. Imagine that all of your data lined up in a neat row. You could draw a straight line connecting all points and would be able to create a simple equation Y = mx + b that we talked about earlier. That way you would have a model that would faithfully predict what your system would do given any input of x.
But what if your data only “kinda-sorta” looks like a line?
Multiple linear regression is an extension to methodology of simple linear regression
Linear Regression
- Statistical technique to estimate the mathematical relationship between a dependent variable (usually denoted as Y) and an independent variable (usually denoted as X).
- In other words, predict the change in the dependent variable according to the change in the independent variable.
- Dependent Variable or Criterion Variable - is the variable for which we wish to make a prediction
- Independent Variable or Predictor Variable - the variable used to explain the dependent variable
When to Use Linear Regression
- In simple linear regression, there is only one independent variable used to predict a single dependent variable.
- In multiple linear regression more than one independent variables used to predict a single dependent variable.
- The basic difference between simple and multiple regression is in terms of explanatory variables.
- E.g. compare the crop yield rate against the rain fall rate in a season
- The basic difference between simple and multiple regression is in terms of explanatory variables.
Notes about Linear Regression
- The first step of linear regression is to test the linearity assumption, this can be performed by plotting the values in a graph known as a scatter plot, to observe the relationship between dependent and independent variable, because if the data is exponentially scattered then there is no meaning to create the regression equation.
- Draw the line which covers the majority of the points
- this line is considered best fit line or line of best fit
- The mathematical equation of the line is
- y=a+bx+ε
- Where:
- b – Slope of the line
- a – y intercept when x=0
- Random error (ε-Epsilon) – The difference between an observed value of y and the mean value of y for a given value of x.
- y=a+bx+ε
Assumption of Linear Regression
- Linear relationship between dependent and independent variable
- All variables of regression to be multivariate normal
- Particularly there is no or little multicollinearity in the data
- Response variable is continuous and also residuals are almost same throughout the regression line
The Method of Least Squares
- The method of least squares is a standard approach in regression analysis to determine the best line for a given data
- It basically provides a visual relationship between the given data points
- In general, the dependent variables are demonstrated on the y-axis
- The independent variables are demonstrated on the x-axis
- The least square method determines the position of a straight line or also called trend line and the equation of the line.
- This straight line is also known as best for line
The least square method means that the overall solution minimizes the sum of squares of the errors made in the results of every single equation. For instance, Least Squares Equation can be used to find the values of the coefficients a and b
The normal rules of Standard Deviation apply here; 68% of the points should be within +/- 1 Standard Error of the line, 95.5% of the points within +/- 2 Standard Error.
Least Squares
a and b computed
Linear Regression example in DMAIC
- Linear Regression is specifically used in Analyze phase of DAMIC to estimate the mathematical relationship between a dependent variable and an independent variable.
-
Example: A passenger vehicle manufacturer reviewing the 10 salespersons training records. In fact, their main aim to compare the salespersons achieved target (in %) with the number of sales module training completed.
- a^ = y^ - b^xbar
- where y^ =10% of sales target achieved total = 10% of 822=82.2
- a^ = y^ - b^xbar
Estimate the Variability of Random Errors
Estimate the Variability of Random Errors
Example
o^e=square root of o^ 2 e
E.g. o^2 e = 28.95
o e = 5.38
Test of Slope Coefficient
- The existence of a signification relationship between dependent and independent variable can be tested by whether b is equal to 0. If b is not equal to 0 there is a linear relationship.
- The null hypotheses and alternative hypotheses are:
- The null hypothesis H0 : b=0
- The alternative hypothesis H1: b≠0
- Degrees of freedom = n-2
Test of Slope Coefficient
Example
t-table critical value chart
OR
Refer to Appendix Q
Values of the t-Distribution
Located in Handbook 2nd Edition Green Tab
Confidence Interval Estimate for the Slop b
Correlation Coefficient
Notes and Formula