Chapter 16: Simple Linear Regression And Correlation Flashcards by Emma Whitney

Regression analysis

A technique used to predict the value of one variable on the basis of other variables

Requires developing an equation that describes the relationship between the variable to be forecast (dependant variable) and variables the practitioner believes it to be relate to (independent variables)

How well did you know this?

Not at all

Perfectly

Correlation analysis

Technique used to determine if a relationship exists between two variables

How well did you know this?

Not at all

Perfectly

Deterministic models

Equations that allow us to determine the value of the dependant variable from the values of the independent variable

How well did you know this?

Not at all

Perfectly

Probabilistic model

Models that include a method to represent the randomness of real live processes

Starts with a deterministic model and then adds a term to measure the random error of the deterministic component

How well did you know this?

Not at all

Perfectly

Error variable

Represented by epsilon

Variance between actual data point and estimated data point from a model

Accounts for all variables (measurable and immeasurable) that are not part of the model

How well did you know this?

Not at all

Perfectly

First-order linear model

Aka simple linear regression model
Aka straight-line model

Includes only one independent variable

Y=B0 + B1x + e

Y= dependant variable
x= independent variable
B0= y-intercept
B1= slope of the line (rise/run)
e= error variable

(So y=Mx+B + error variable)

X and y must both be interval data

Coefficients B0 and B1 are population parameters (almost always unknown, so must estimate)

How well did you know this?

Not at all

Perfectly

Least squares line coefficients

For y-hat=bO+b1x

b1= sample covariance of x and y / sample variance of x

b0= sample mean of y - (b1* sample mean of x)

How well did you know this?

Not at all

Perfectly

Sample variance

s^2= sum of each value of (x- mean x)^2/ n-1

Shortcut= 1/(n-1)*(sum of all values of x^2- ((sum of all values of x)^2)/n)

Excel: VAR function

How well did you know this?

Not at all

Perfectly

Sample covariance

Sxy= sum of ((all values x - mean x)*(all values of y * mean y))/n-1

Shortcut= (1/(n-1))* ((sum of all values xy)-((sum of all valuessum of all values y)/n))

Excel: COVAR function

How well did you know this?

Not at all

Perfectly

Least squares method

Produces a straight line that minimizes the sum of the squared differences between the actual points and the line

How well did you know this?

Not at all

Perfectly

Residuals

The deviations between the actual data points and the least squares line (ei)

ei= y(actual) - y-hat (calculated)

Observations of the error variable

How well did you know this?

Not at all

Perfectly

Sum of squares for error

Minimized sum of squared deviations between observed y and calculated y

SSE

How well did you know this?

Not at all

Perfectly

Regression analysis in excel

Type x and y data into two columns (cannot have missing data)

Go to data, data analysis, regression

Input y range and x range

Intercept coefficient is b0 (intercept)

X data coefficient is b1 (slope)

How well did you know this?

Not at all

Perfectly

Inferences from least squares line

Coefficients are only about sample data. Not ready to be used as inferences for population parameters

Intercept isn’t necessarily the value of y when x= 0 just an estimate based on the rest of the data, but generally values of y can’t be reliably determined for a value of x outside the range of the sample values

How well did you know this?

Not at all

Perfectly

Required conditions for the error variable

1) probability distribution of e is normal
2) the mean of the distribution is 0; that is E(e)=0
3) the standard deviation of e is sigma e, which is a constant regardless of the value of x

1-3: for each value of x,y is a normally distributed random variable whose mean is E(y)=B0 + B1x whose standard deviation is sigma e

4) the value of e associated with any particular value of y is independent of e associated with any other value of y

How well did you know this?

Not at all

Perfectly

Methods to assess the regression model

Study These Flashcards

Standard error of estimate
t test of slope
coefficient of determination

All based on the sum of squared for error

Sum of squares for error

Study These Flashcards

SSE: minimized sum of squared deviation (between the data points and the line defined by the coefficients)

Shortcut calculation of SSE

Study These Flashcards

= (n-1)(sample variance y -(sample covariance of x and y(squared)/ sample variance of x))

Standard error of estimate

Study These Flashcards

Standard deviation of errors determines fit: if large fit is poor, if small fit is good

Must use sample standard deviation to estimate population

Standard deviation of error variable= square root of (SSE/n-2)

Also standard error value in excel regression statistics

Smallness or largeness of se judged by comparing it to the sample mean of the dependant variable. If small then can say relatively small.

Very useful for comparing models. Not useful as an absolute measure

Testing the slope

Study These Flashcards

Horizontal line (slope = 0) implies lack of linear relationship (B1 = slope)

Test of the slope is a hypothesis test where:
H0: B1= 0 (aka, no linear relationship)

H1: B1 =/= 0 (two tail test)

Test statistic for b1

Study These Flashcards

t=(sample slope - population slope) / standard error of sample slope)

(Standard error of sample slope = standard error of estimate /(square root of (n-1)* sample variance of the independent variable))

v= n-2

Confidence interval estimator of the population slope (B1)

Study These Flashcards

Sample slope (B1) +/- t((a/2)*standard error of sample slope)

v=n-2

One tail tests

Study These Flashcards

One tail tests can be used to test if there is a positive or negative linear relationship between the variables

H1: B1< 0 looks for a negative linear relationship

H1: B1 > 0 looks for a positive linear relationship

Same test statistic, just have to divide the p-value by 2

Coefficient of determination

Study These Flashcards

Measure of the strength of a linear relationship between variables (how much of the variation in the dependant variable that can be explained by variation in the independent variable)

R2 = s^2 xy/ s^2x * s^2y

(Covariance of x and y / sample variance x * sample variance of y)

Or
R2= 1- (SSE / (sum of all values (y - mean of y) squared)

Essentially explained variation/ total variation in y

R square value in excel regression analysis

The higher the value of R2 the better the model fits the data

ANOVA table

Part of excel regression analysis: analysis of variance table Shows sources of variation in y Regression = SSR = variation in y explained by x Error (residual) = SSE = variation in y still unexplained ``` SS = sum of squares MS= mean of squares (ss/df) ``` F statistic = MSR/MSE (mean of squares regression/mean of squares error)

Cause and effect relationship

Remember: correlation between values of x and y is not necessarily x determining y. Could be an unknown factor determining both. Cannot tell from statistics alone. Need a reasonable theoretical relationship

Sample coefficient of correlation

r= sxy/ sx*sy Sample coefficient of correlation= sample covariance / sample variance x * sample variance of y Determines whether there is a linear relationship between two variables Use for observational data with two bivariate normally distributed variables

Test statistic for testing that p (population coefficient of correlation) = 0

t= r(square root of ((n-2)/ (1-r^2))) V= n-2 Provided variables are bivariate normally distributed Can also do one tail tests to check for p<0 and p> 0

Violation of required condition

When the normality requirement is unsatisfied we can use the spearman rank correlation coefficient (a nonparametric technique) to replace the t-test of p

Chapter 16: Simple Linear Regression And Correlation Flashcards

(29 cards)