statistics - topic 7 - bivariate regression Flashcards

1
Q

what is regression analysis used to do ?

A

Explain the impact of changes in an independent variable on a dependent variable
Predict the value of a dependent variable based on the value of at least one independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the dependent variable?

A

The dependent variable is the variable we wish to explain (also called the endogenous variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the independent variable?

A

The independent variable is the variable used to explain the dependent variable (also called the exogenous or explanatory variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does the population regression model show?

A

it shows the relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the components of a population regression model?

A

it has a dependent variable, population intercept, population slope coefficient, independent variable and an error term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the sample data used for ?

A

Sample data is used to provide an estimate of the population regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are the assumptions required for the least squares estimation to be an accurate estimate?

A

The true relationship is linear (π‘Œ is a linear function of 𝑋, plus a random error)
The error term, πœ€_𝑖, is uncorrelated with the random variable, 𝑋
The error term, πœ€_𝑖, has a mean of 0 and constant variance, 𝜎^2 (the latter property is called homoscedasticity):
𝐸[πœ€_𝑖 ]=0 and 𝐸[πœ€_𝑖^2 ]=𝜎^2 for 𝑖=1,…,𝑛
The error terms, πœ€_𝑖, are not correlated with one another, so that:
𝐸[πœ€_𝑖 πœ€_𝑗 ]=0 for all 𝑖≠𝑗

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the least squares method?

A

Least squares provides estimates of 𝛽_0 and 𝛽_1 by finding the values of 𝑏_0 and 𝑏_1 that minimize the sum of the squared errors (SSE):
min⁑𝑆𝑆𝐸=minβ‘βˆ‘(𝑖=1)^𝑛▒𝑒_𝑖^2 =minβ‘βˆ‘(𝑖=1)^𝑛▒(𝑦_π‘–βˆ’π‘¦Μ‚π‘– )^2 =minβ‘βˆ‘(𝑖=1)^𝑛▒[𝑦_π‘–βˆ’(𝑏_0+𝑏_1 π‘₯_𝑖 )]^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the equation for b1 in the least squares coefficient estimator?

A

𝑏_1
=(βˆ‘(π‘₯_π‘–βˆ’π‘₯Μ… )(𝑦_π‘–βˆ’π‘¦Μ… ) ) /(βˆ‘(π‘₯_π‘–βˆ’π‘₯Μ… )^2 )
=πΆπ‘œπ‘£(π‘₯,𝑦)/(𝑠_π‘₯^2 )
=π‘Ÿ x (𝑠_𝑦/𝑠_π‘₯ )
where π‘Ÿ is πΆπ‘œπ‘Ÿπ‘Ÿ(π‘₯,𝑦)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the regression line after you have estimated b1?

A

𝑏_0=π‘¦Μ…βˆ’π‘_1 π‘₯Μ…
because the regression line goes through the sample means π‘₯Μ…, 𝑦̅

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are the two parts of the variation in a dependent ratio?

A

the total sum of the squares is eqal to the regression sum of the squares + the error sum of the squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the formula for the regression sum of the squares?

A

βˆ‘(𝑦̂_π‘–βˆ’π‘¦Μ… )^2 where 𝑦̂_𝑖 = predicted value of the dependent variable given 𝑋=π‘₯_𝑖 and 𝑦̅ = sample mean of the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the formula for the error sum of the squares?

A

βˆ‘(𝑦_π‘–βˆ’π‘¦Μ‚_𝑖 )^2 where 𝑦_𝑖 = observed value of the dependent variable and 𝑦̂_𝑖 = predicted value of the dependent variable given 𝑋=π‘₯_𝑖

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the coefficient of determination?

A

it is the proportion of the total variation in the depenedent variable that is explained by variation in the independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the formula for the coefficient of determination?

A

𝑅^2=𝑆𝑆𝑅/𝑆𝑆𝑇=(π‘Ÿπ‘’π‘”π‘Ÿπ‘’π‘ π‘ π‘–π‘œπ‘› π‘ π‘’π‘š π‘œπ‘“ π‘ π‘žπ‘’π‘Žπ‘Ÿπ‘’π‘ )/(π‘‘π‘œπ‘‘π‘Žπ‘™ π‘ π‘’π‘š π‘œπ‘“ π‘ π‘žπ‘’π‘Žπ‘Ÿπ‘’π‘ )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

in what range does the coefficent of determination live in and what is it equal to?

A

0≀𝑅^2≀1 and 𝑅^2=π‘Ÿ^2 where π‘Ÿ denotes the correlation coefficient

17
Q

what does it mean when R^2 is equal to 1?

A

When 𝑅^2=1, there is a perfect linear relationship between 𝑋 and π‘Œ: 100% of the variation in π‘Œ is explained by variation in 𝑋

18
Q

what does it mean when R^2 is equal to 0?

A

When 𝑅^2=0, there is no linear relationship between 𝑋 and π‘Œ: the value of π‘Œ does not depend on 𝑋

19
Q

what is the standard deviation of e_i and why is this the case?

A

The standard deviation of 𝑒_𝑖 is:
𝑠_𝑒=√((βˆ‘(𝑒_𝑖^2 )/(π‘›βˆ’2))=√(𝑆𝑆𝐸/(π‘›βˆ’2))
Division is by 𝑛–2 instead of 𝑛–1 because the estimated regression model contains two estimated coefficients, 𝑏_0 and 𝑏_1

20
Q

what is the standard deviation for b1?

A

The standard deviation of 𝑏_1 is:
𝑠_(𝑏_1 )=√((𝑠_𝑒^2) / (βˆ‘(π‘₯_π‘–βˆ’π‘₯Μ… )^2 ))

21
Q

what does the standard deviation of b1 show?

A

It is a measure of variation in the slope of regression lines from different samples

22
Q

what is the test statistic for a hypothesis test of a slope?

A

𝑑=𝑏_1/𝑠_(𝑏_1 )

23
Q

what is the decision rule for the hypothesis test of a slope?

A

Reject 𝐻_0 if 𝑑< βˆ’π‘‘_(π‘›βˆ’2, π›Όβˆ•2) or 𝑑>𝑑_(π‘›βˆ’2,π›Όβˆ•2)

24
Q

what is the formula for the confidence interval of a slope?

A

𝑏_1±𝑑_(π‘›βˆ’2,π›Όβˆ•2) 𝑠_(𝑏_1 )

25
Q

what will be affected by the units of measurement used?

A

The units used to measure the dependent and independent variable will affect 𝑏_0, 𝑏_1, 𝑠_(𝑏_0 ), 𝑠_(𝑏_1 ) and the upper and lower confidence limits

26
Q

if you multiply the dependent variable by a what is the effect on the statistics?

A

it will be a times bigger

27
Q

if you multiply the independent variable by a what is the effect on the statistics?

A

it will be a times smaller

28
Q

what will not be affected by the units of measurement?

A

the test statistic and the p value will be unaffected

29
Q

what are outliers?

A

Outliers are observations that have π‘Œ values that differ substantially from their predicted value, 𝑦̂_𝑖

30
Q

what are extreme points?

A

Extreme points are observations that have 𝑋 values that differ substantially from the 𝑋 values of other observations

31
Q

what is the confidence interval for the mean value of Y, given a particular value of X,?

A

𝑦̂(𝑛+1)±𝑑(π‘›βˆ’2,π›Όβˆ•2) 𝑠_𝑒 √(1/𝑛+(π‘₯_(𝑛+1)βˆ’π‘₯Μ… )^2/βˆ‘(π‘₯_π‘–βˆ’π‘₯Μ… ) ^2 )

32
Q

how does the size of the interval vary with the distance of Xn+1 from the mean x?

A

it varies postively

33
Q

what is the prediction interval for Yn+1 given a particular value of X ?

A

𝑦̂(𝑛+1)±𝑑(π‘›βˆ’2,π›Όβˆ•2) 𝑠_𝑒 √(1+1/𝑛+(π‘₯_(𝑛+1)βˆ’π‘₯Μ… )^2/βˆ‘(π‘₯_π‘–βˆ’π‘₯Μ… ) ^2 )

34
Q

what is the point of the +1 term in the prediction interval for Yn+1?

A

it adds to the interval width to reflect the added uncertainty for an individual case