Exam Three Flashcards
Linear equation for one independent variable
Y= b0 + b1x
First step to analysis
Constructing a graph of the data (scatterplot)
Deterministic model
An exact relationship where there is 1 value of y for every x
Probabilistic model
Allows for variability in y at each x value
Least squares criterion
When multiple lines can fit scatterplot, the line with better fit is the one where the sum of the squared errors is smaller
Calculating error (e)
1: find vertical distance between a line and a data point
2:sum of squared errors
S(XX)
Sum of x(I) - mean (x) ^2
S(xy)
Sum (xi-x bar)(yi-y-bar)^2
S(yy)
Sum (yi-y bar)^2
B1=
B1= sxy/sxx
B0=
Mean y - b1 * mean x
What do these equations help us do?
Find the regression equation
How to examine the utility of a regression
Determine percentage of the variation in observed values of y that is explained by x (define both variation and amount of variation)
SST
SST: sum of yi-y bar squared
This is a measure of total variation
SSR
Sum of y hat - y bar squared
Measure of the amount of variation in the dependent variable explained by regression
SSE
Yi- y hat i ^2
Variation in y NOT explained by the regression
Relation between SST, SSR and SSE
SST = SSE + SSR
(Total = unexplained + explained)
R^2 (coefficient of determination)
Percentage of variation in dependent explained by fitted regression
Between 0 and 1, near 0 means means it not useful, near 1 means it is
R^2 equation
SSR/SST
Coefficient of determination
Measure of how well outcomes are replicated by the model (r^2)
MSE
Mean squared error. Estimates the average of the squares of errors
MSE= (1/n) sum of (yi - y hat)^2
R
Aka: correlation coefficient
Measures how strong the linear relationship between x and y is. Ranges between -1 and 1. Strongest on either end, weakest close to 0
R= (sxy)/ sqrt (sxx * syy)
excel: = CORREL (column x, column y)
Caution about correlation
Correlation can’t prove causation: strong correlation can be produced by chance, effect of 3rd variable, etc.
Near 0 correlation may just mean the variables don’t have a linear relationship.
Outliers can also affect correlation
Purpose of sample regression
To predict a population regression