statistics - topic 7 - bivariate regression Flashcards
what is regression analysis used to do ?
Explain the impact of changes in an independent variable on a dependent variable
Predict the value of a dependent variable based on the value of at least one independent variable
what is the dependent variable?
The dependent variable is the variable we wish to explain (also called the endogenous variable)
what is the independent variable?
The independent variable is the variable used to explain the dependent variable (also called the exogenous or explanatory variable)
what does the population regression model show?
it shows the relationship between two variables
what are the components of a population regression model?
it has a dependent variable, population intercept, population slope coefficient, independent variable and an error term
what is the sample data used for ?
Sample data is used to provide an estimate of the population regression model
what are the assumptions required for the least squares estimation to be an accurate estimate?
The true relationship is linear (π is a linear function of π, plus a random error)
The error term, π_π, is uncorrelated with the random variable, π
The error term, π_π, has a mean of 0 and constant variance, π^2 (the latter property is called homoscedasticity):
πΈ[π_π ]=0 and πΈ[π_π^2 ]=π^2 for π=1,β¦,π
The error terms, π_π, are not correlated with one another, so that:
πΈ[π_π π_π ]=0 for all πβ π
what is the least squares method?
Least squares provides estimates of π½_0 and π½_1 by finding the values of π_0 and π_1 that minimize the sum of the squared errors (SSE):
minβ‘πππΈ=minβ‘β(π=1)^πβπ_π^2 =minβ‘β(π=1)^πβ(π¦_πβπ¦Μπ )^2 =minβ‘β(π=1)^πβ[π¦_πβ(π_0+π_1 π₯_π )]^2
what is the equation for b1 in the least squares coefficient estimator?
π_1
=(β(π₯_πβπ₯Μ
)(π¦_πβπ¦Μ
) ) /(β(π₯_πβπ₯Μ
)^2 )
=πΆππ£(π₯,π¦)/(π _π₯^2 )
=π x (π _π¦/π _π₯ )
where π is πΆπππ(π₯,π¦)
what is the regression line after you have estimated b1?
π_0=π¦Μ
βπ_1 π₯Μ
because the regression line goes through the sample means π₯Μ
, π¦Μ
what are the two parts of the variation in a dependent ratio?
the total sum of the squares is eqal to the regression sum of the squares + the error sum of the squares
what is the formula for the regression sum of the squares?
β(π¦Μ_πβπ¦Μ )^2 where π¦Μ_π = predicted value of the dependent variable given π=π₯_π and π¦Μ = sample mean of the dependent variable
what is the formula for the error sum of the squares?
β(π¦_πβπ¦Μ_π )^2 where π¦_π = observed value of the dependent variable and π¦Μ_π = predicted value of the dependent variable given π=π₯_π
what is the coefficient of determination?
it is the proportion of the total variation in the depenedent variable that is explained by variation in the independent variable
what is the formula for the coefficient of determination?
π ^2=πππ /πππ=(ππππππ π πππ π π’π ππ π ππ’ππππ )/(π‘ππ‘ππ π π’π ππ π ππ’ππππ )
in what range does the coefficent of determination live in and what is it equal to?
0β€π ^2β€1 and π ^2=π^2 where π denotes the correlation coefficient
what does it mean when R^2 is equal to 1?
When π ^2=1, there is a perfect linear relationship between π and π: 100% of the variation in π is explained by variation in π
what does it mean when R^2 is equal to 0?
When π ^2=0, there is no linear relationship between π and π: the value of π does not depend on π
what is the standard deviation of e_i and why is this the case?
The standard deviation of π_π is:
π _π=β((β(π_π^2 )/(πβ2))=β(πππΈ/(πβ2))
Division is by πβ2 instead of πβ1 because the estimated regression model contains two estimated coefficients, π_0 and π_1
what is the standard deviation for b1?
The standard deviation of π_1 is:
π _(π_1 )=β((π _π^2) / (β(π₯_πβπ₯Μ
)^2 ))
what does the standard deviation of b1 show?
It is a measure of variation in the slope of regression lines from different samples
what is the test statistic for a hypothesis test of a slope?
π‘=π_1/π _(π_1 )
what is the decision rule for the hypothesis test of a slope?
Reject π»_0 if π‘< βπ‘_(πβ2, πΌβ2) or π‘>π‘_(πβ2,πΌβ2)
what is the formula for the confidence interval of a slope?
π_1Β±π‘_(πβ2,πΌβ2) π _(π_1 )
what will be affected by the units of measurement used?
The units used to measure the dependent and independent variable will affect π_0, π_1, π _(π_0 ), π _(π_1 ) and the upper and lower confidence limits
if you multiply the dependent variable by a what is the effect on the statistics?
it will be a times bigger
if you multiply the independent variable by a what is the effect on the statistics?
it will be a times smaller
what will not be affected by the units of measurement?
the test statistic and the p value will be unaffected
what are outliers?
Outliers are observations that have π values that differ substantially from their predicted value, π¦Μ_π
what are extreme points?
Extreme points are observations that have π values that differ substantially from the π values of other observations
what is the confidence interval for the mean value of Y, given a particular value of X,?
π¦Μ(π+1)Β±π‘(πβ2,πΌβ2) π _π β(1/π+(π₯_(π+1)βπ₯Μ )^2/β(π₯_πβπ₯Μ ) ^2 )
how does the size of the interval vary with the distance of Xn+1 from the mean x?
it varies postively
what is the prediction interval for Yn+1 given a particular value of X ?
π¦Μ(π+1)Β±π‘(πβ2,πΌβ2) π _π β(1+1/π+(π₯_(π+1)βπ₯Μ )^2/β(π₯_πβπ₯Μ ) ^2 )
what is the point of the +1 term in the prediction interval for Yn+1?
it adds to the interval width to reflect the added uncertainty for an individual case