Chapter 16: Simple Linear Regression And Correlation Flashcards
Regression analysis
A technique used to predict the value of one variable on the basis of other variables
Requires developing an equation that describes the relationship between the variable to be forecast (dependant variable) and variables the practitioner believes it to be relate to (independent variables)
Correlation analysis
Technique used to determine if a relationship exists between two variables
Deterministic models
Equations that allow us to determine the value of the dependant variable from the values of the independent variable
Probabilistic model
Models that include a method to represent the randomness of real live processes
Starts with a deterministic model and then adds a term to measure the random error of the deterministic component
Error variable
Represented by epsilon
Variance between actual data point and estimated data point from a model
Accounts for all variables (measurable and immeasurable) that are not part of the model
First-order linear model
Aka simple linear regression model
Aka straight-line model
Includes only one independent variable
Y=B0 + B1x + e
Y= dependant variable x= independent variable B0= y-intercept B1= slope of the line (rise/run) e= error variable
(So y=Mx+B + error variable)
X and y must both be interval data
Coefficients B0 and B1 are population parameters (almost always unknown, so must estimate)
Least squares line coefficients
For y-hat=bO+b1x
b1= sample covariance of x and y / sample variance of x
b0= sample mean of y - (b1* sample mean of x)
Sample variance
s^2= sum of each value of (x- mean x)^2/ n-1
Shortcut= 1/(n-1)*(sum of all values of x^2- ((sum of all values of x)^2)/n)
Excel: VAR function
Sample covariance
Sxy= sum of ((all values x - mean x)*(all values of y * mean y))/n-1
Shortcut= (1/(n-1))* ((sum of all values xy)-((sum of all valuessum of all values y)/n))
Excel: COVAR function
Least squares method
Produces a straight line that minimizes the sum of the squared differences between the actual points and the line
Residuals
The deviations between the actual data points and the least squares line (ei)
ei= y(actual) - y-hat (calculated)
Observations of the error variable
Sum of squares for error
Minimized sum of squared deviations between observed y and calculated y
SSE
Regression analysis in excel
Type x and y data into two columns (cannot have missing data)
Go to data, data analysis, regression
Input y range and x range
Intercept coefficient is b0 (intercept)
X data coefficient is b1 (slope)
Inferences from least squares line
Coefficients are only about sample data. Not ready to be used as inferences for population parameters
Intercept isn’t necessarily the value of y when x= 0 just an estimate based on the rest of the data, but generally values of y can’t be reliably determined for a value of x outside the range of the sample values
Required conditions for the error variable
1) probability distribution of e is normal
2) the mean of the distribution is 0; that is E(e)=0
3) the standard deviation of e is sigma e, which is a constant regardless of the value of x
1-3: for each value of x,y is a normally distributed random variable whose mean is E(y)=B0 + B1x whose standard deviation is sigma e
4) the value of e associated with any particular value of y is independent of e associated with any other value of y