Reading Quiz 15 Flashcards

1
Q

least-squares regression

A

fits a straight line to data in order to predict a response variable y from the explanatory variable x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

inference about regression conditions

A
  1. the observations must be independent
  2. the true relationship is linear
  3. the standard deviation of the response about the true line is the same everywhere
  4. the response varies normally about the true regression line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

the observations must be independent condition

A

in particular, repeated observations on the same individual are not allowed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

the true relationship is linear condition

A

we can’t observe the true regression line, so we will almost never see a perfect straight-line relationship in our data
look at the scatter plot to check that the overall pattern is roughly linear
a residual plot against x magnifies any unusual pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

the standard deviation of the response about the true line is the same everywhere condition

A

look at the residual plot
the scatter of the data points (the vertical distance) about the y=0 line should be roughly the same over the entire range of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

the response varies normally about the true regression line condition

A

the residuals estimate the deviations of the response from the true regression line, so they should follow a normal distribution
make a boxplot, histogram, or stemplot of the residuals and check for clear skewness or other major departures from normality
slight departures from normality do not greatly affect inference for regression, so they are allowed, particularly when we have many observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

regression model

A

says that there is a true regression line μy = α + βx that describes how the mean response μy varies as x changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

true regression line

A

μy = α + βx

describes how the mean response μy varies as x changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

the observed response y for any fixed x has a normal

A

σ for any value of x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

the parameters of the regression model are

A

the intercept α estimated by a, the slope β estimated by b, and the standard deviation σ estimated by s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

the true slope β says how much

A

change in y when x increases by 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

the standard deviation σ describes

A

how much variation there is in responses y when x is fixed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

to estimate σ

A

use the standard error about the line, s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

s

A

regression standard error
s= sqrt((Σresiduals^2)/(n-2)) = sqrt((Σ(y-yhat)^2)/(n-2))
sample standard deviation of the residuals
spread of data (measure of variability) around the least squares regression line
“typical” amount of prediction error when using a linear regression model to make predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

calculator compute s

A

enter data into L1 and L2

STATS, TESTS, LinRegTTest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

regression standard error has how many degrees of freedom

A

n - 2

all t procedures in regression inference have n-2 degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

inference for regression goal

A

predict behavior of y for given values of x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

inference for regression cont

A

there is an “on average” straight line relationship between y and x
saying μy moves along a straight line as explanatory variable x changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

inference can be done in two ways:

A

confidence intervals for the slope and a significance test to test the hypothesis that the true slope is 0

20
Q

confidence intervals for the slope of the true regression line have the general form

A

b ± t*SEb.

21
Q

in practice, we use software to find

A

the slope b of the least-squares line and its standard error SEb

22
Q

formula for confidence interval for slope of true regression line

A

b ± t*SEb where SEb = s/(sqrt Σ(x-xbar)^2)

23
Q

some calculators have the program to calculate the confidence interval for the slope

A

stat, tests, linregtint
if you don’t have the program you can run a linregttest to obtain the b and t values, then calculate the SEb value knowing that t = (b - β)/SEb
then substitute that into the b ± tSEb form with the t value from the t distribution chart using df = n - 2

24
Q

t =

A

(b - β)/SEb

25
Q

to test the hypothesis that the true slope is zero, use the

A
t statistic
t = (b - β)/SEb
also given by software
t statistic is the standardized slope of the LSRL
stat, tests, linregttest (L1 and L2)
26
Q

regression output from statistical software usually gives

A

t and its two sided p-value

for a one-sided test, divide the p-value by 2

27
Q

the most common hypothesis is

A

Ho: β = 0
this says there is no true linear relationship between x and y
it also says that straight-line dependence on x has no value for predicting y
it also says that the population correlation between x and y is zero

28
Q
To review: we use least-squares regression to study the relation between a couple of variables, both of
which are (quantitative, categorical).
A

quantitative

29
Q

Before doing regressions to study the relationship between two quantitative variables, we should explore the data by examining a _______ and a __________.

A

scatterplot, residual plot

30
Q

The statistic that describes the strength of a linear relationship, that is the same whichever variable is thought of as the explanatory variable, and which has a familiar relationship to the percent of variance in one variable explained by the other, is the ______ ______.

A

A. correlation coefficient (or just, the correlation)

31
Q

What is a residual?

A

A. A residual is the vertical distance between the data point and the regression line, or y - y-hat.

32
Q

The r-squared value, which is part of the regression output, tells us how much of what is what?

A

A. How much of the variation in the y variable is accounted for by the linear relationship with x.

33
Q

Suppose we draw lots of samples and compute a regression line for each sample. The slope and intercept of each sample line estimates a true value. Thus the slope and intercept we obtain from our sample are _____ that estimate population ______.

A

statistics; parameters

34
Q

One of the conditions for regression inference is that for any fixed value of x, the response variable y varies according to a _____ distribution.

A

normal

35
Q

Another assumption for regression inference is that for any fixed value of x, the repeated responses y are ____ of each other.

A

independent

36
Q

Another assumption for regression inference is that the means of the sets of y-values for each x value have what relationship to the x values?

A

A. That the means of the y’s for each x are a linear function of x: mean for y’s = alpha + beta * x

37
Q

Another assumption for regression inference is that what measure of dispersion is equal for each value of x?

A

A. The standard deviation of the y’s for the various x values.

38
Q

True or False: the slope and intercept we obtain from the least squares regression for our sample are unbiased estimators, respectively, of the line connecting the population means for each of the x’s.

A

true

39
Q

What is the unbiased estimator for the standard deviation of the y values around the regression line (in other words, the standard deviation of the y values around the means of each of those values for each x)?

A

A. The statistic called s, which is the standard error, or the standard deviation of the residuals. .

40
Q

The statistic s represents the estimate of the standard deviation ____ in the regression model.

A

A.  (sigma)

41
Q

The parameter we are usually most interested in estimating from regression output is the (slope, y-
intercept) of the line.

A

slope

42
Q

What is the general form for a confidence interval for regression slope?

A

A. b plus or minus t*SEb

43
Q

The most commonly tested hypothesis about regressions is that Beta, the “Population slope,” is 0. Can you put this hypothesis in some other phrasings?

A

A. That the straight line dependence on x is of no value in predicting y. Or that the population correlation between x and y is 0. Or that there is no true linear relationship between x and y in the population.

44
Q

If you form the ratio of the slope obtained in your sample to the standard error of that slope, what is the sampling distribution of that statistic?

A

A. It’s distributed according to the t distribution, with n-2 degrees of freedom.

45
Q

Regression output usually gives a two-sided p value for the hypothesis test that the population slope is 0. How do you obtain a one-sided p-value for the same hypothesis?

A

A. Divide the two-sided p-value by two.

46
Q

Suppose that in a residual plot, the values are close to 0 when x is low, but the residuals get bigger and bigger in absolute value as the x values get greater. What condition of regression is violated in this circumstance?

A

A. The condition that the standard deviation of the response around the true line is the same everywhere.

47
Q

Someone examines a residual plot and a scatterplot and observes a curvilinear pattern. What condition of regression is being violated, and what should the researcher consider doing in order to correct this?

A

A. The condition violated is that the true relationship is linear. The researcher should consider transforming one or more of the variables.