PRE FI LEC 1: REGRESSION ANALYSIS Flashcards
✔ A form of PREDICTIVE MODELING TECHNIQUE which investigates
the relationship between a DEPENDENT (target) and
INDEPENDENT variable (predictor).
REGRESSION ANALYSIS
Father of regression analysis
Carl F. Gauss (1777-1855)
first person who used the term
regression
Francis Galton (1877)
graphical representation of the relation between two or more variables.
- two variables x and y, each point on the plot is an x-y pair.
A. GRAPH PLOT
B. REGRESSION PLOT
C. SCATTER PLOT
Scatter plot
We use _________ and ___________ to describe the variation in
one or more variables.
A. REGRESSION; CORRELATION
B. CORRELATION; REGRESSION
regression; correlation
The _______ is the SUM of the squared deviations
of a variable.
Variation
The variation is the numerator of the _______ of a
sample
Variance
Both the variation and the variance are ____________________ of a sample.
measures of the dispersion
The ___________between two random variables is a statistical measure of the DEGREE TO WHICH THE 2 VARIABLES MOVE TOGETHER.
- captures how one variable is different from its mean as the other variable is different from its mean.
- is calculated as the RATIO OF THE COVARIATION of the SAMPLE SIZE LESS ONE
- actual value is NOT MEANINGFUL because it is AFFECTED BY THE SCALE of 2 VARIABLES. That is why we calculate the correlation coefficient – to make something interpretable from the covariance information.
covariance
- indicates that the variables TEND TO MOVE TOGETHER
POSITIVE COVARIANCE
- indicates that the variables tend to move in
OPPOSITE DIRECTIONS.
NEGATIVE COVARIANCE
is a measure of the STRENGTH OF THE RELATIONSHIP between or among variables.
correlation coefficient (r)
is an EXTREME VALUE of a variable.
- may be quite large or small (where large and small are defined relative to the rest of the sample).
-may affect the sample statistics, such as a correlation coefficient.
- may result in spurious correlation.
OUTLIER
is the appearance of a relationship when in fact there is no relation.
Spurious correlation
The correlation coefficient DOES NOT INDICATE A CAUSAL RELATIONSHIP. Certain data items may be highly correlated, but not necessarily a result of a causal relationship.
T or F?
T
- is the analysis of the relation between one variable and some other variable(s), assuming a linear relation.
- Also referred to as LEAST SQUARES REGRESSION
and ORDINARY LEAST SQUARES (OLS).
a. The purpose is to explain the variation in a variable (that is, how a variable differs from it mean value) using the variation in one or more other variables.
b. Suppose we want to describe, explain, or
predict why a variable differs from its mean.
c. The least squares principle is that the
regression line is determined by minimizing
the sum of the squares of the vertical distances between the actual Y values and the predicted values of Y. A line is fit through the XY points such that the sum of the squared residuals (that is, the sum of the squared the vertical distance between the observations and the line) is minimized.
Regression
- is the variable whose variation is BEING EXPLAINED by the other variable(s).
Also referred to as the
EXPLAINED VARIABLE, the ENDOGENOUS VARIABLE, or the PREDICTED VARIABLE.
DEPENDENT VARIABLE
- is the variable whose variation is used to explain that of the dependent variable.
- Also referred to as the EXPLANATORY VARIABLE , the EXOGENOUS VARIABLE, or the PREDICTING VARIABLE
INDEPENDENT VARIABLE
The parameters in a simple regression
equation are the slope (b1) and the intercept
(b0):
yi = b0 + b1 xi + i
b1, is the change in Y for a given one unit change in X.
- can be positive, negative, or zero
SLOPE