statistics - topic 7 - bivariate regression Flashcards by ben moulds

what is regression analysis used to do ?

Explain the impact of changes in an independent variable on a dependent variable
Predict the value of a dependent variable based on the value of at least one independent variable

How well did you know this?

Not at all

Perfectly

what is the dependent variable?

The dependent variable is the variable we wish to explain (also called the endogenous variable)

How well did you know this?

Not at all

Perfectly

what is the independent variable?

The independent variable is the variable used to explain the dependent variable (also called the exogenous or explanatory variable)

How well did you know this?

Not at all

Perfectly

what does the population regression model show?

it shows the relationship between two variables

How well did you know this?

Not at all

Perfectly

what are the components of a population regression model?

it has a dependent variable, population intercept, population slope coefficient, independent variable and an error term

How well did you know this?

Not at all

Perfectly

what is the sample data used for ?

Sample data is used to provide an estimate of the population regression model

How well did you know this?

Not at all

Perfectly

what are the assumptions required for the least squares estimation to be an accurate estimate?

The true relationship is linear (𝑌 is a linear function of 𝑋, plus a random error)
The error term, 𝜀_𝑖, is uncorrelated with the random variable, 𝑋
The error term, 𝜀_𝑖, has a mean of 0 and constant variance, 𝜎^2 (the latter property is called homoscedasticity):
𝐸[𝜀_𝑖 ]=0 and 𝐸[𝜀_𝑖^2 ]=𝜎^2 for 𝑖=1,…,𝑛
The error terms, 𝜀_𝑖, are not correlated with one another, so that:
𝐸[𝜀_𝑖 𝜀_𝑗 ]=0 for all 𝑖≠𝑗

How well did you know this?

Not at all

Perfectly

what is the least squares method?

Least squares provides estimates of 𝛽_0 and 𝛽_1 by finding the values of 𝑏_0 and 𝑏_1 that minimize the sum of the squared errors (SSE):
min⁡𝑆𝑆𝐸=min⁡∑(𝑖=1)^𝑛▒𝑒_𝑖^2 =min⁡∑(𝑖=1)^𝑛▒(𝑦_𝑖−𝑦̂𝑖 )^2 =min⁡∑(𝑖=1)^𝑛▒[𝑦_𝑖−(𝑏_0+𝑏_1 𝑥_𝑖 )]^2

How well did you know this?

Not at all

Perfectly

what is the equation for b1 in the least squares coefficient estimator?

𝑏_1
=(∑(𝑥_𝑖−𝑥̅ )(𝑦_𝑖−𝑦̅ ) ) /(∑(𝑥_𝑖−𝑥̅ )^2 )
=𝐶𝑜𝑣(𝑥,𝑦)/(𝑠_𝑥^2 )
=𝑟 x (𝑠_𝑦/𝑠_𝑥 )
where 𝑟 is 𝐶𝑜𝑟𝑟(𝑥,𝑦)

How well did you know this?

Not at all

Perfectly

what is the regression line after you have estimated b1?

𝑏_0=𝑦̅−𝑏_1 𝑥̅
because the regression line goes through the sample means 𝑥̅, 𝑦̅

How well did you know this?

Not at all

Perfectly

what are the two parts of the variation in a dependent ratio?

the total sum of the squares is eqal to the regression sum of the squares + the error sum of the squares

How well did you know this?

Not at all

Perfectly

what is the formula for the regression sum of the squares?

∑(𝑦̂_𝑖−𝑦̅ )^2 where 𝑦̂_𝑖 = predicted value of the dependent variable given 𝑋=𝑥_𝑖 and 𝑦̅ = sample mean of the dependent variable

How well did you know this?

Not at all

Perfectly

what is the formula for the error sum of the squares?

∑(𝑦_𝑖−𝑦̂_𝑖 )^2 where 𝑦_𝑖 = observed value of the dependent variable and 𝑦̂_𝑖 = predicted value of the dependent variable given 𝑋=𝑥_𝑖

How well did you know this?

Not at all

Perfectly

what is the coefficient of determination?

it is the proportion of the total variation in the depenedent variable that is explained by variation in the independent variable

How well did you know this?

Not at all

Perfectly

what is the formula for the coefficient of determination?

𝑅^2=𝑆𝑆𝑅/𝑆𝑆𝑇=(𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠)/(𝑡𝑜𝑡𝑎𝑙 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠)

How well did you know this?

Not at all

Perfectly

in what range does the coefficent of determination live in and what is it equal to?

Study These Flashcards

0≤𝑅^2≤1 and 𝑅^2=𝑟^2 where 𝑟 denotes the correlation coefficient

what does it mean when R^2 is equal to 1?

Study These Flashcards

When 𝑅^2=1, there is a perfect linear relationship between 𝑋 and 𝑌: 100% of the variation in 𝑌 is explained by variation in 𝑋

what does it mean when R^2 is equal to 0?

Study These Flashcards

When 𝑅^2=0, there is no linear relationship between 𝑋 and 𝑌: the value of 𝑌 does not depend on 𝑋

what is the standard deviation of e_i and why is this the case?

Study These Flashcards

The standard deviation of 𝑒_𝑖 is:
𝑠_𝑒=√((∑(𝑒_𝑖^2 )/(𝑛−2))=√(𝑆𝑆𝐸/(𝑛−2))
Division is by 𝑛–2 instead of 𝑛–1 because the estimated regression model contains two estimated coefficients, 𝑏_0 and 𝑏_1

what is the standard deviation for b1?

Study These Flashcards

The standard deviation of 𝑏_1 is:
𝑠_(𝑏_1 )=√((𝑠_𝑒^2) / (∑(𝑥_𝑖−𝑥̅ )^2 ))

what does the standard deviation of b1 show?

Study These Flashcards

It is a measure of variation in the slope of regression lines from different samples

what is the test statistic for a hypothesis test of a slope?

Study These Flashcards

𝑡=𝑏_1/𝑠_(𝑏_1 )

what is the decision rule for the hypothesis test of a slope?

Study These Flashcards

Reject 𝐻_0 if 𝑡< −𝑡_(𝑛−2, 𝛼∕2) or 𝑡>𝑡_(𝑛−2,𝛼∕2)

what is the formula for the confidence interval of a slope?

Study These Flashcards

𝑏_1±𝑡_(𝑛−2,𝛼∕2) 𝑠_(𝑏_1 )

what will be affected by the units of measurement used?

The units used to measure the dependent and independent variable will affect 𝑏_0, 𝑏_1, 𝑠_(𝑏_0 ), 𝑠_(𝑏_1 ) and the upper and lower confidence limits

if you multiply the dependent variable by a what is the effect on the statistics?

it will be a times bigger

if you multiply the independent variable by a what is the effect on the statistics?

it will be a times smaller

what will not be affected by the units of measurement?

the test statistic and the p value will be unaffected

what are outliers?

Outliers are observations that have 𝑌 values that differ substantially from their predicted value, 𝑦 ̂_𝑖

what are extreme points?

Extreme points are observations that have 𝑋 values that differ substantially from the 𝑋 values of other observations

what is the confidence interval for the mean value of Y, given a particular value of X,?

𝑦 ̂_(𝑛+1)±𝑡_(𝑛−2,𝛼∕2) 𝑠_𝑒 √(1/𝑛+(𝑥_(𝑛+1)−𝑥 ̅ )^2/∑(𝑥_𝑖−𝑥 ̅ ) ^2 )

how does the size of the interval vary with the distance of Xn+1 from the mean x?

it varies postively

what is the prediction interval for Yn+1 given a particular value of X ?

𝑦 ̂_(𝑛+1)±𝑡_(𝑛−2,𝛼∕2) 𝑠_𝑒 √(1+1/𝑛+(𝑥_(𝑛+1)−𝑥 ̅ )^2/∑(𝑥_𝑖−𝑥 ̅ ) ^2 )

what is the point of the +1 term in the prediction interval for Yn+1?

it adds to the interval width to reflect the added uncertainty for an individual case

statistics - topic 7 - bivariate regression Flashcards

(34 cards)