Week 1 content Flashcards
When we study the relationship between two variables, what do we refer to?
a bivariate analysis
What graphical technique shows the relationship between variables?
scatter diagram
What do you need to draw a scatter diagram?
two variables
scale one variable along the horizontal axis (X axis)
scale the other variable along the vertical axis (Y axis)
What is the dependent variable?
the variable being predicted or estimated
What is the independent variable?
the predictor variable
it provides the basis for estimation
What is the coefficient of correlation also known as?
r
What is the coefficient of correlation (r)?
a measure of the strength of the relationship between two variables
it can range from -1 to +1
When r = -1 or 1, what does this indicate?
perfect and strong correlation
If r = -1, and there is a negative slope, what is the correlation?
perfect negative correlation
If r = =1, and there is a positive slope, what is the correlation?
perfect positive correlation
What do positive values of r indicate?
a direct relationship
eg there are two variables, as values assumed by A increase, then the values of B increase as well
What do negative values of r indicate?
an inverse relationship
eg there are two variables e and f, as the values assumed by e increase, then the values of the f decrease
What is the equation for r?
r = ∑(Xi - X̄) (Yi - ȳ) / (n-1) SxSy
What does the correlation coefficient (r) depend on?
r depends entirely on dispersion
the product of the total dispersions of each variable
the product of the standard deviations of each variable
What does x̄ = ?
x̄ =(ΣXi) / n
the mean of variable X
What does Sx = ?
Sx=√((ΣXi - x̄)^2 / √(n-1))
the standard deviation of variable X
What does ȳ = ?
ȳ=(∑Yi)/n
the mean of variable Y
What does Sy = ?
Sy=√((∑Yi- ȳ)^2 / (n-1))
the standard deviation of variable Y
What does n stand for?
the number of observations
What is r if:
n = 10
x̄ = 22
ȳ = 45
Sx = 9.189
Sy = 14.337
r = ∑(Xi - X̄) (Yi - ȳ) / (n-1) SxSy
–> r = 900 /(10-1) (9.189) (14.337)
–> r = 0.759
When using excel, what is the function to calculate (r) the correlation coefficient?
=CORREL()
What does a correlation of 0.759 mean?
it is positive, therefore there’s a positive relationship between the variables
0.759 is close to +1, thus the correlation is strong
What does the knowledge of the existing casual relationship between two variables imply?
relationship between X and Y is described by a linear function
changes in Y are assumed to be related to changes in X
We can predict the value of a dependent variable based on the value of at least one independent variable
we can explain the impact of changes in an independent variable on the dependent variable
What does correlation NOT mean?
causation
What does casual relationship mean?
one variable is determined by another
What is the linear regression model?
an equation with only two variables plus an error term
What does the error term do?
marks the difference between a deterministic equation and a regression equation
What is the linear regression model equation?
Yi = b0 + b1Xi + ei
where:
Yi = dependent variable
b0 = population y intercept
b1 = population slope coefficient
Xi = independent variable
ei = random error term
What is the random error component in the linear regression equation?
Yi = b0 + b1Xi + ei
ei
What is the linear component in the linear regression equation?
Yi = b0 + b1Xi + ei
b0 + b1Xi
Why do errors occur?
not every point will be on the regression line, most are scattered around it
on the contrary, any prediction based on the regression line will be exactly on the line
thus, we can expect an error to occur when comparing the true values to the predicted values
What does the simple linear regression model provide?
an estimate of the observed values
What is the simple linear regression equation / prediction line?
Ŷi = b0 + b1Xi
where:
Ŷ = estimated/predicted Y value for observation i
b0 = estimate of the regression intercept
b1 = estimate of the regression slope
Xi = value of X for observation i
What is b1 and how do you work it out?
the slope
b1 = r (Sy / Sx)
where:
r = the correlation coefficient between Y and X
Sy = the standard deviation of Y
Sx= the standard deviation of X
ȳ = the average of y
x̄ = the average of x
What is b0 and how do you work it out?
the intercept
b0 = ȳ - b1x̄
where:
r = the correlation coefficient between Y and X
Sy = the standard deviation of Y
Sx= the standard deviation of X
ȳ = the average of y
x̄ = the average of x
When is the b0 the estimated mean value of Y?
when the value of X is zero
What is b1 being the estimated change in the mean value of Y a result of?
a one-unit increase in X
What is the interpretation of a positive slope?
an increase in X corresponds an increase in Y
ΔY = b1ΔX
What is the interpretation of a negative slope?
an increase in X corresponds a decrease in Y
ΔY = -b1ΔX
What is the interpretation of the slope (b1) when it equals zero?
there is no relationship between Y and X
What is the difference between the predicted and observed values equal to?
the error term
ei = Yi - Ŷi
What can the linear regression model be used to make and how?
predictions
if the intercept (b0) and the slope (b1) of the prediction line are known, then the quantitative relationship between the dependent variable (y) and the independent variable (x) is known
–> then we can predict the value of Y given a value of X