chpt 14 Flashcards

1
Q

What does simple linear regression use

A
  1. one independent variable and one dependent variable

2. uses a straight line to approximate the relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does multiple regression use

A
  • 2 or more independent variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 2 objectives for simple linear regression

A
  1. establish if there is a relationship b/w 2 variables (ie income and spending)
  2. Forecast new observations (ie. sales over next Qrt)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the dependent variable denoted by

A

y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the independent variable denoted by

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the dependent variable

A

the variable being predicted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the independent variable

A

the variable(s) used to predict the values of the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the formula for the simple linear regression

A

Y = B0 + B1X + E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does Y represent in the simple linear regression model

A

the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what does B0 represent in the simple linear regression model

A

intercept or constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does B1x represent

A

coefficient of x or slope of x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does E represent in the simple linear regression model

A

error term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the error term account for in the simple linear regression model

A

accounts for the variability in y that can’t be explained by the linear relationship b/w x and y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the simple linear regression equation

A

E(y) = B- + B1x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does E(y) represent

A

mean or expected value of y for a given value of x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What can we note about B0 adn B1 in the simple linear regression equation

A

they are known

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the estimate simple linear regression equation

A

y hat = b0 + b1x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When do we use the estimate simple linear regression equation

A

when B0 and B1 are NOT known

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does y hat represent in the estimate simple linear regression equation

A

point estimate of E(y)

- provides a prediction of an individual value of y for a given value of x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are B0 and B1

A

population parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what are b0 and b1

A

sample statistics to estimate B0 and B1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

IF we are trying to predict sales for a given level of advertising what is the dependent and independent variable

A

Dependent variable - sales (y)

Independent variable - advertising expenditures (x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what does “simple” indicate in simple linear regression

A

one independent variable and one dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does “linear” Indicate in Simple linear regression

A

the relationship is approximated using a straight line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is B0 in the simple linear regression model
the y-intercept of the regression line or the value of y when x is 0
26
What is B1 in the simple linear regression model
the slope of the regression line - the line tells us two things 1. whether the line is increasing or decreasing 2. how steep it is
27
What is E in the simple linear regression model
the error term | - as good as our model might be, there is always random error term that cannot be accounted for
28
if the line slopes upward, what is the relationship
as x increases, so does y - positive relationship | B1 - will be positive
29
if the line slopes downward, what is the relationship
as x increases, y decreases, negative relationship | B1 - would be negative
30
what if the line is straight across (the regression line is flat)
no relationship, as x increases, y remains the same | B1 is 0
31
What are the POPULATION parameters for the y intercept and the slope
B0 and B1
32
what are the sample statistics used to estimate B0 and B1
b0 and b1
33
what does y hat represent in the simple linear regression
the predicted value of y for a given x value
34
what is the estimated simple linear regression equation
y hat = b0 +b1x
35
What does the Coefficient of Determination tell us about the estimated regression equation
how well does the estimated regression equation fit the data
36
What does the Coefficient of Determination provides us with
a measure of the goodness of fit
37
in Coefficient of determination, what is the ith residual
the predicted value of the dependent variable y hat i
38
for the ith observation, the residual is indicated by what
yi- y hat i
39
What is the formula for the coefficient of determination
r squared = SSR/SST
40
what does r squared represent
the coefficient of determination
41
What does SSR stand for in coefficient of determination
sum of squares due to regression
42
what does SST stand for in Coefficient of determination
sum of squares for the total deviation
43
What is the formula for SSR in coefficient of determination
sum (y hat i - y bar) squared
44
what does the SSR in coefficient of determination measure
the difference b/w the predicted values and the average or | how much the y hat values on teh estimated regression line deviates from y hat
45
What does SSE in coefficient of determination stand for
sum of squares due to Error
46
What is the formula in Coefficient of determination for SST
sum (yi - ybar) squared
47
what is the formula in Coefficient of determination for SSE
sum (yi - y hat i) squared
48
In Coefficient of determination, how do you calculate SST
SST = SSR + SSE
49
What should we expect regarding SST, SSR and SSE in the coefficient of determination
we should expect that SST, SSR and SSE related from
50
What would be a perfect fit in coefficient of determination
SSR = SST | SSR / SST = 1
51
What would a poor fit be in coefficient of determination
large values for SSE | - poorest fit when SSR = 0 and SSE = SST
52
What is r squared
percent of variability in y can be explained by x
53
if r squared = 95.5%, what can we say
95.5% of the variability in grades for instance, can be explained by the number of hours studied
54
What does the correlation Coefficient measure
it measures the strength of association b/w x and y
55
What does the correlation Coefficient measure
it measures the strength of association b/w x and y
56
what is the correlation Coefficient denoted by
r
57
what are the values of r in correlation Coefficient
between -1 and +1
58
In Correlation Coefficient, if r = 1, what does this mean
means perfect positive linear relationship b/w x and y - no deviation - all the data points from the sample lay exactly on the line of regression with no deviation and the line slopes upward
59
In Correlation Coefficient, if r = -1, what does this mean
means perfect negative linear relationship b/w x and y - no deviation - all data points from the sample lay exactly on the line of regression with no deviation and the line slopes downward
60
In Correlation Coefficient, if r = 0, what does this mean
no relationship b/w x and y
61
what is the formula for correlation Coefficient
rxy = (sign of b1)x square root of coefficient of determination or rxy = (sing of b1) x square root of rsqaured
62
in correlation Coefficient, what is b1
slope of the estimate
63
In correlation coefficient, since the square root of anything doesn't tell us if the number was negative or postive we have to look at what
the slope and then we use the sign for our slope example b1 is positive 4.74 then we use positive sign rxy = +.9505
64
if rxy is .9749 what does this indicate
a very strong positive linear relationship bw x and y
65
Testing for Significance if y=B0+B1x +E | if B1 = 0 then Y=
B0 no matter what value x is - the value of y does not depend on x (no linear relationship b/w x and y)
66
What is the null hypothesis and the alternative for testing significance in Simple Linear Regression
``` Ho= B1 = 0 Ha = B1 does not = 0 ```
67
What test do we use when testing for signfiicanace in simple linear regression
t test
68
what is the formula for the test statitistic when testing for significance in simple linear regression
t = b1 / sb1
69
what does sb1 stand for
standard error for slope
70
what is the formula for sb1 (the standard error for the slope)
sb1 = s (standard deviation) / square root sum (xi - xbar)squared
71
what is the formula for s in sb1
s = square root of (SSE/n-2)
72
Coefficient of determination - Definition
A measure of the goodness of fit of the estimated regression equation. It can be interpreted as the proportion of the variability in the dependent variable y that is explained by the estimated regression equation
73
Confidence interval - Definition
The interval estimate of the mean value of y for a given value of x.
74
Correlation coefficient - Definition
A measure of the strength of the linear relationship between two variables
75
Dependent variable - definition
The variable that is being predicted or explained. It is denoted by y.
76
Estimated regression equation - Definition
The estimate of the regression equation developed from sample data by using the least squares method. For simple linear regression, the estimated regression equation is yˆ = b0 + b1x.
77
High leverage points - Definition
Observations with extreme values for the independent variables
78
Independent variable - Definition
The variable that is doing the predicting or explaining. It is denoted by x.
79
Influential observation - Definition
An observation that has a strong influence or effect on the regression results.
80
ith residual - Definition
The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation; for the ith observation the ith residual is yi − yˆi.
81
Least squares method - Definition
A procedure used to develop the estimated regression equation. The objective is to minimize o( yi − yˆi)2.
82
Mean square error - Definition
The unbiased estimate of the variance of the error term s2. It is denoted by MSE or s2.
83
Normal probability plot - Definition
A graph of the standardized residuals plotted against values of the normal scores. This plot helps determine whether the assumption that the error term has a normal probability distribution appears to be valid
84
Outlier - Definition
A data point or observation that does not fit the trend shown by the remaining data
85
Prediction interval - Definition
The interval estimate of an individual value of y for a given value of x.
86
Regression equation - Definition
THe Equation that describes how the mean or expected value of the dependent variable is related to the independent variable; in simple linear regression, E(y) = b0 + b1x.
87
Regression model - Definition
The equation that describes how y is related to x and an error term; in simple linear regression, the regression model is y = b0 + b1x + e.
88
Residual Analysis - Definition
The analysis of the residuals used to determine whether the assumptions made about the regression model appear to be valid. Residual analysis is also used to identify outliers and influential observations
89
Residual Plot - Definition
Graphical representation of the residuals that can be used to determine whether the assumptions made about the regression model appear to be valid
90
Scatter Diagram - Definition
A graph of bivariate data in which the independent variable is on the horizontal axis and the dependent variable is on the vertical axis
91
Simple linear regression - Definition
Regression analysis involving one independent variable and one dependent variable in which the relationship between the variables is approximated by a straight line.
92
Standard error of the estimate - Definition
The square root of the mean square error, denoted by s. It is the estimate of s, the standard deviation of the error term e
93
Standardized residual - Definition
The value obtained by dividing a residual by its standard deviation
94
Regression and Correlation analysis are used to study what
relationships between two or more variables
95
The focus in correlation analysis is on assessment of
the size and the direction of the relationship
96
The relationship between variables is said to be positive if
the two variables increase together and decrease together
97
The relationship between variables is said to be negative if
they move in different directions
98
In regression analysis, on the other hand, the focus is on what
prediction
99
The value of one variable is predicted form the value of another variable based on what
a model relating the two variables. the model has to be estimated, using a sample from the bivariate distribution, before it can be used
100
What is the first thing to do in a regression analysis
plot the data as a scatter diagram, and see if the assumption of a linear relationship is plausible
101
If the data points are scatter in such a way that a straight line can be drawn through them they are
clustered around the line and the assumption of a linear relationship is reasonable
102
if the data is scattered in such a way that a straight line cannot be drawn through them then they are
such as a curve, or in no pattern at all the assumption is violated that it is a linear relationship
103
Geometrically, the actual value yi is the ________ of the point form the _________axis to the point on the regression line
height horizontal
104
The distance between the two (yi and Y triangle hat) is the
error at the given xi
105
if the actual value of y triangle hat is on the regression line, then yi ________ and the error is ________
= y triangle hat and the error is zero
106
If the actual value yi-y triangle hat is above the regression line, ,this results in a _________
positive error
107
if the actual value of yi-y triangle hat is below the regression line, this results in a
negative error
108
Is there an error term for each data point?
yes
109
When does a perfect fit occur in the simple linear regression
when all these error terms are zero, which means all the data points are lined up along a straight line
110
is a perfect fit rare in simple linear regression
yes
111
what would be the best regression line
is a line throughthe points that minimizes the errors in some sense
112
Who proposed the least squares method
Gauss
113
Explain the least squares method
deals with the squares of the errors, instead of the errors themselves, so it treats only the size of the errors and not heir sings
114
What does the least squares method do
it minimizes the sum of squared errors
115
Why is the relation of SST = SSR + SSE fundamental
because it assesses the contribution of the regression as a source of variation compared to other sources of variation in the data
116
since we are studying the effect of regression alone, all other sources of variation are what
lumped under the general label "error" and are treated as one source. This is similar to the notion of "between" and "within" variations
117
What is the mean Square due to error for simple linear regression
one measure of the goodness of fit for a regression equation
118
MSE is also
S^2
119
MSE is useful
only in a relative sense a value of say, 13.829 for MSE does not tell us whether the fit is good or bad. nor, if good, does it tell us how good the fit is. It is only useful when we compared it with MSE for another model or fit
120
When comparing MSE which one is better
the one with the smaller MSE is better
121
What is MSE very sueful for
constructing tests of significance and confidence intervals
122
What is the square root of MSE use for
to estimate the standard error of an estimate which servers as a benchmark for decisions regarding the size of a difference between an estimate and its hypothesized value
123
What test is used with MSE for Simple linear regression
t-test
124
Can MSE be used as a comparison by itself and what test is used
a comparison can be made directly with the MSE, since MSE is itself a measure of variation. This results in the F test ?
125
An observation can be both an outlier and what
an influential observation, it can be an outlier but not an influential observation, or it can be an influential observation but not an outlier
126
In identifying an outlier, we focus on what
the y value (or equivalently, on the residual or standardized residual) of a point
127
When identifying an influential observation, the focus is on what
the x values
128
Observations who x values are very different from the x values of the rest of the data are most likely
influential observations
129
Those whose y values are way off the trend of the other points are most likely
outliers
130
The variable being predicted is called
the dependent variable
131
What are the independent variable
The variable or variables being used to predict the value of the dependent variable are called the independent variables
132
In simple linear regression, each observation consists of two values, what are they
1. one for the independent variable | 2. one for the dependent variable
133
Can regression analysis be interpreted as a procedure for establishing a cause-and-effect relationship between variables?
no, it can only indicate how or to what extent variables are associated with each other any conclusions about cause and effect must be based upon the judgement of those individuals most knowledgeable about eh application
134
Using the estimated regression equation to make predictions outside the range of the values of the independent variable - what caution is there
should be done with caution because outside that range we cannot be sure that the same relationship is valid
135
What does the least squares method provides for the estimated regression equation do
minimizes the sum of squared deviations between the observed values of the dependent variable yi and the predicated values of the dependent variable ytraingle hat i
136
The least squared criterion for estimated regression is used to do what
to choose the equation that provides the best fit. It is the mostly widely used method
137
Coefficent of determination provides what
a measure of goodness of fit for the estimated regression equation
138
What is SSE a measure of in the estimated regression
it is a measure of the error in using the estimated regression equation to predict values of the dependent variable in the sample
139
If you don't have the knowledge of the xi, what would you use to estimate of something
you would use the mean value the estimated regression is a much better predictor than using the mean value
140
What can SSR be thought of as
the explained portion of SST
141
What can SSE be thought of as
the unexplained portion of SST
142
What would be a perfect fit for the estimated regression
yi - y triangle hat = 0 this means that every value of the dependent variable yi lies on the estimated regression line
143
If the estimate regression is a perfect fit, what can we say about SSE
SSE = 0 and SSR/SST = 1
144
If SSE is large, what can we say about the estimated regression
poorer fits will have larger values of SSE
145
What would be the poorest fit when
the largest value for SSE occurs when SSR = 0 and SSE=SST
146
When SSE = SST what kind of fit is this
poorest fit
147
What values will the ratio SSR/SST take
take on teh values between 0 and 1
148
what does r^2 stand for
coefficient of determination
149
r^2 formula
SSR/SST
150
if r^2 SSR/SST =close to 1
good a fit
151
What would r^2 = .9027 mean
90.27% of the variability in yi can be explained by the estimated regression equation
152
What is correlation Coefficient a measure of
a descriptive measure of the STRENGTH of linear association between two variables (x and y)
153
What are the values that correlation coefficient take on
between -1 and +1
154
What does a value of +1 Correlation Coefficient mean
indicates that the two variables x and y are perfectly related in a positive linear sense. That all data points are on a straight line that has a positive slope
155
What do values close to zero represent for Correlation Coefficent
indicate that x and y are not linearly related
156
What is the formula for Correlation Coefficient
rxy = (sing of b1) Square root of Coefficient of determination rxy = square root of r^2
157
Coefficient of determination provides a measure between what numbers and Correlation Coefficient provides a measure between what numbers
Coefficient of Determination r^2 is between 0 and 1 Correlation Coefficient rxy = square root of r^2 is between -1 and +1
158
The sample correlation coefficient is restricted to what
A linear relationship between two variables
159
The coefficient of determination can be used for what
nonlinear relationships and for relationships that have two or more intendent variables thus, the coefficient of determination r^2, provides a wider range of applicability
160
Which provides a wider range of applicability coefficient of determination or Correlation Coefficient
Coefficient of Determination
161
When using r^2 we can draw no conclusion about what
whether the relationship between x and y is statistically significant - such conclusion must be based on considerations that involve the sample size and the properties of the appropriate sampling distributions of the least squares estimators
162
SSE is what in Simple linear regression
sum of squared residuals
163
SSE is a measure of what
of the variability of the actual observations about the estimated regression line
164
In simple linear regression, does the F test and t test provide the same results?
yes, if it is just for one I.V.
165
If it is more than one IV, does the F test and t test provide the same results
no, only the F test can be used to test for an overall significant relationship
166
Confidence intervals and prediction intervals show the precision of the regression results. Narrower intervals provide what
a higher degree of precision
167
A confidence interval is an interval estimate of what
the mean value of y for a given value of x
168
a prediction interval is an interval estimate of what
used to predict an individual value of y for a new observation corresponding to a given value of x
169
the margin of error is large for which interval, a confidence interval or prediction interval
prediction interval
170
What is the margin of error associated with a prediction interval
t a/2 spread
171
In general, the lines of the confidence interval limits and the prediction interval limits both have what
curvature
172
confidence intervals and prediction intervals are both more precise when the value of the IV x* is closer to
x bar
173
What may an outlier represent
1. erroneous data - error recording, s/b corrected 2. signal a violation of the model assumption - may need to consider another model 3. unusual values that occurred by chance - should stay
174
What is an influential observation
1. it could be an outlier 2. can influence how the data is interpreted if this data set was removed, it would change our slope from negative to positive for example
175
if the Influential observation is valid
1. can contribute to a better understanding of the appropriate mode and lead to a better estimate regression equation 2. try to obtain data on intermediate values of x to better understand the relationship b/w x and y
176
What is high leverage
the father xi is form it's mean (x bar) the higher the leverage of the observation - need computer software to help with this
177
Explain lower leverage
outside of the other data sets but won't change the line
178
explain high leverage
outside of the other data sets by a lot
179
explain lower leverage low influence
near to the line
180
explain high leverage low influence
need some work
181
HOw is outlier determined
if it is outside of the +2 or -2 from the mean line
182
What is an outlier denoted as on a computer print out
R
183
if we only have one variable, how can we predict another amount
by using the mean
184
if we are only using one variable to predict, what is the best fit line
the mean
185
What do you use to measure of how well the estimated regression line FITS the data
R2 the coefficient of determination
186
What test do we use to test whether B1 is significant
t-test b1/sb1