Unit 13a: Regression I Simple Linear Regression Flashcards
The correlation coefficient…
A. Will always fall between 0 and 1
B. Compares the mean of a sample to a population
C. Measures how strongly related two variables are
D. Can only be calculated for 2 continuous (i.e., ratio) variables
C. Measures how strongly related two variables are
Purpose of Simple (bivariate) Regression
How relations are used to predict outcomes: the stronger the correlation, the more accurate the prediction.
* The correlated variables can indicate the values of each other using a simple or bivariate regression.
* Standard error of the estimate can help us determine the accuracy of a prediction
strong correlation means accurate or weak prediction
accurate
correlation
If two variables co-vary, they have a relation (correlation)
* Regression extends the correlation to make a prediction of one variable on another.
* The accuracy of the prediction depends on the strength of the relation (correlation)
* More information shared between the variables (a higher, stronger correlation) means less error
* Not to be confused with one variable causing another.
* Not sure which variable cause which
* Or if a third variable accounts for relation (confounder)
Simple/Bivariate Regression has how many variables
2-
- 1 outcome variable
- 1 predictor variable
Key Elements of Linear Regression
- F-test: the Omnibus Test
- Is there any association
- Regression Equation
- Beta coefficient
- “Slope” of the function
- This is the important element we want because it has
meaningful interpretation
Key Elements of Linear Regression
- F-test: the Omnibus Test
(Is there any association?) - Regression Equation
- Beta coefficient, the “Slope” of the function
- This is the important element we want because it has meaningful interpretation
Regression Equation
y = mx + b
* 𝑦 = 𝛽0 + 𝛽1𝑥
* Introduce an index i for each participant or observation (𝑋𝑖, 𝑌𝑖)
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖
* We allow an error in the equation for each observation i
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝜀𝑖
The regression equation tells us that y’ (the predicted value of y) is a
function of a value for the intercept (𝑎), a value for the slope (𝑏), and
a value for the predictor variable (x).
Regression Equation parts
Regression Equation
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝜀𝑖
* 𝑌𝑖: value of the outcome (or response or dependent) variable for the ith
observation
* 𝑋𝑖: value of the predictor (or independent) variable for the ith
observation
* 𝛽0 & 𝛽1: regression parameters (the intercept and slope) to be estimated
* 𝛽0 = the intercept
* 𝛽1 = the slope
* 𝜀𝑖: the random error term for the ith observation
Regression Analysis Variables
In regression language, the criterion variable is regressed on the
predictor variable.
* Criterion variable: the variable to predict
— The dependent variable; the y axis in a scatter-plot
—Actual values denoted as y
—Predicted values denoted y’ or ො𝑦
- Predictor variable: the variable used in the prediction
–The independent variable; the x axis in a scatter-plot
Criterion variable
the variable to predict
Predictor variable
the variable used in the prediction
Determining the line of best fit:
the Least Squares Criterion
A good fit will limit the divergence between our predicted value and
the actual data (the “error”)
* With a single line, we cannot fit the data exactly.
* Some points will be above and some below
* How do we trade-off these errors?
* Minimize the square of the errors
* Deviations above or below the line are treated equally so you square them
Evidence of prediction error is known as
a residual score.
* The difference between y and y’ (or ො𝑦).
* 𝑒𝑖 = 𝑌𝑖 −
the least squares criterion
The sum of the squared differences between the actual (y) and predicted (y’) values must have their lowest possible value
Regression designed to minimize the sum of the squared differences between y and y’.