Chapter 16: Simple Linear Regression And Correlation Flashcards

1
Q

Regression analysis

A

A technique used to predict the value of one variable on the basis of other variables

Requires developing an equation that describes the relationship between the variable to be forecast (dependant variable) and variables the practitioner believes it to be relate to (independent variables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Correlation analysis

A

Technique used to determine if a relationship exists between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Deterministic models

A

Equations that allow us to determine the value of the dependant variable from the values of the independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Probabilistic model

A

Models that include a method to represent the randomness of real live processes

Starts with a deterministic model and then adds a term to measure the random error of the deterministic component

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Error variable

A

Represented by epsilon

Variance between actual data point and estimated data point from a model

Accounts for all variables (measurable and immeasurable) that are not part of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

First-order linear model

A

Aka simple linear regression model
Aka straight-line model

Includes only one independent variable

Y=B0 + B1x + e

Y= dependant variable
x= independent variable
B0= y-intercept
B1= slope of the line (rise/run)
e= error variable

(So y=Mx+B + error variable)

X and y must both be interval data

Coefficients B0 and B1 are population parameters (almost always unknown, so must estimate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Least squares line coefficients

A

For y-hat=bO+b1x

b1= sample covariance of x and y / sample variance of x

b0= sample mean of y - (b1* sample mean of x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sample variance

A

s^2= sum of each value of (x- mean x)^2/ n-1

Shortcut= 1/(n-1)*(sum of all values of x^2- ((sum of all values of x)^2)/n)

Excel: VAR function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sample covariance

A

Sxy= sum of ((all values x - mean x)*(all values of y * mean y))/n-1

Shortcut= (1/(n-1))* ((sum of all values xy)-((sum of all valuessum of all values y)/n))

Excel: COVAR function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Least squares method

A

Produces a straight line that minimizes the sum of the squared differences between the actual points and the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Residuals

A

The deviations between the actual data points and the least squares line (ei)

ei= y(actual) - y-hat (calculated)

Observations of the error variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sum of squares for error

A

Minimized sum of squared deviations between observed y and calculated y

SSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Regression analysis in excel

A

Type x and y data into two columns (cannot have missing data)

Go to data, data analysis, regression

Input y range and x range

Intercept coefficient is b0 (intercept)

X data coefficient is b1 (slope)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Inferences from least squares line

A

Coefficients are only about sample data. Not ready to be used as inferences for population parameters

Intercept isn’t necessarily the value of y when x= 0 just an estimate based on the rest of the data, but generally values of y can’t be reliably determined for a value of x outside the range of the sample values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Required conditions for the error variable

A

1) probability distribution of e is normal
2) the mean of the distribution is 0; that is E(e)=0
3) the standard deviation of e is sigma e, which is a constant regardless of the value of x

1-3: for each value of x,y is a normally distributed random variable whose mean is E(y)=B0 + B1x whose standard deviation is sigma e

4) the value of e associated with any particular value of y is independent of e associated with any other value of y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Methods to assess the regression model

A
  • Standard error of estimate
  • t test of slope
  • coefficient of determination

All based on the sum of squared for error

17
Q

Sum of squares for error

A

SSE: minimized sum of squared deviation (between the data points and the line defined by the coefficients)

18
Q

Shortcut calculation of SSE

A

= (n-1)(sample variance y -(sample covariance of x and y(squared)/ sample variance of x))

19
Q

Standard error of estimate

A

Standard deviation of errors determines fit: if large fit is poor, if small fit is good

Must use sample standard deviation to estimate population

Standard deviation of error variable= square root of (SSE/n-2)

Also standard error value in excel regression statistics

Smallness or largeness of se judged by comparing it to the sample mean of the dependant variable. If small then can say relatively small.

Very useful for comparing models. Not useful as an absolute measure

20
Q

Testing the slope

A

Horizontal line (slope = 0) implies lack of linear relationship (B1 = slope)

Test of the slope is a hypothesis test where:
H0: B1= 0 (aka, no linear relationship)

H1: B1 =/= 0 (two tail test)

21
Q

Test statistic for b1

A

t=(sample slope - population slope) / standard error of sample slope)

(Standard error of sample slope = standard error of estimate /(square root of (n-1)* sample variance of the independent variable))

v= n-2

22
Q

Confidence interval estimator of the population slope (B1)

A

Sample slope (B1) +/- t((a/2)*standard error of sample slope)

v=n-2

23
Q

One tail tests

A

One tail tests can be used to test if there is a positive or negative linear relationship between the variables

H1: B1< 0 looks for a negative linear relationship

H1: B1 > 0 looks for a positive linear relationship

Same test statistic, just have to divide the p-value by 2

24
Q

Coefficient of determination

A

Measure of the strength of a linear relationship between variables (how much of the variation in the dependant variable that can be explained by variation in the independent variable)

R2 = s^2 xy/ s^2x * s^2y

(Covariance of x and y / sample variance x * sample variance of y)

Or
R2= 1- (SSE / (sum of all values (y - mean of y) squared)

Essentially explained variation/ total variation in y

R square value in excel regression analysis

The higher the value of R2 the better the model fits the data

25
Q

ANOVA table

A

Part of excel regression analysis: analysis of variance table

Shows sources of variation in y

Regression = SSR = variation in y explained by x
Error (residual) = SSE = variation in y still unexplained

SS = sum of squares 
MS= mean of squares (ss/df)

F statistic = MSR/MSE (mean of squares regression/mean of squares error)

26
Q

Cause and effect relationship

A

Remember: correlation between values of x and y is not necessarily x determining y. Could be an unknown factor determining both. Cannot tell from statistics alone. Need a reasonable theoretical relationship

27
Q

Sample coefficient of correlation

A

r= sxy/ sx*sy

Sample coefficient of correlation= sample covariance / sample variance x * sample variance of y

Determines whether there is a linear relationship between two variables

Use for observational data with two bivariate normally distributed variables

28
Q

Test statistic for testing that p (population coefficient of correlation) = 0

A

t= r(square root of ((n-2)/ (1-r^2)))

V= n-2

Provided variables are bivariate normally distributed

Can also do one tail tests to check for p<0 and p> 0

29
Q

Violation of required condition

A

When the normality requirement is unsatisfied we can use the spearman rank correlation coefficient (a nonparametric technique) to replace the t-test of p