Ch. 17 Flashcards

1
Q

Regression

A

A method that predicts values of one numerical variable from values of another numerical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Difference between regression and correlation

A

Correlation measures the strenght of association in the data, which reflects on the scatter of the data

Regression fits a line through the data to predict one vriable from another and to measure how steeply one variable changes w/ changes in the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Linear regression

A

Most common regression

Assumes a linear relationship between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Least-squares regression line

A

Line for which the sum for all squared deviation in Y is the smallest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Slope (what is it?)

A

The slope of a linear regression is the rate of change in Y per unit X

Represented by b(sample estimate), population version (B, beta)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is “Y-hat”?

A

It represents the prediction of Y-values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do predicted values of Y tell you?

A

They give you an estimate of the mean value of Y for all individuals for that given value of X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Residual

A

Observed value minus predicted value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

MSresiduals

A

Gives the variance of the residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Confidence bands

A

95% Confidence bands measure the precision of the predicted MEAN Y for each value of X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Prediction intervals

A

Measure the precision of the predicted SINGLE Y-values for each X (usually 95%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Extrapolation

A

The prediction of the value of a response variable outside the range of X-values in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is extrapolation a bad idea?

A

There is no way to guarantee the relationship between X and Y holds for points beyond the range of the data; thus, it is not accurate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Degrees of Freedom for Regression?

A

n-2 (because we needed to calculate slope and intercept)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When can ANOVA be used in place of the t-test?

A

When the test is two-sided and the null hypothesized slope is ZERO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

R2

A

SSregression/SStotal;predicts the amount of variance explained by the regression line

17
Q

What does it mean when R2 is close to 1?

A

It means that X predicts most of the variation in Y (and that Y would be clustered tightily around the regression line with little scatter)

18
Q

What does it mean when R2 is close to 0?

A

It means that X does not predict much of the variation in Y, and the data points will be widely scattered above and below the regression line

19
Q

What is the name for r^2?

A

Coefficient of determination

20
Q

Assumptions for linear regression?

A
  • For each value of X, there is a population of possible Y-values whose mean lies on the true reegression line (this is the assumption that the relationship must be linear)
  • Y is normally distributed with equal variance for all values of X
  • Y is a random sample of possible Y values
21
Q

How to detect outliers?

A

Use a scatter plot of the data and examine it

22
Q

How to reduce effect of the outlier?

A

Transform the data

23
Q

How to detect non-linearity?

A

Use a scatter plot and see whether you can fit a straight line through the data well

24
Q

Residual plot

A

A residual plot is a scatter plot of the residuals (Yi - Yhat; i.e. Y in sample subtracted by Y predicted), against X, the values of the explanatory variable

25
Q

How would one detect non-normality and/or unequal variance?

A

Inspect a residual plot; should have:

  • a symmetric cloud of points above and below the horizontal line at 0; with higher density of points close to the line than away from the line
  • Little noticeable curvature moving left to right along x-axis
  • Approximately equal variance of points above and below the line at all values of X
26
Q

Effect of measurement error in Y in regression?

A

Variance of residuals increases; sampling error increases, slope expected remains the same

27
Q

Effect of measurement error in X?

A

Increases variance of the residuals; causses bias in expected estimate of the slope (closer to zero than true slope B, on average)

28
Q

How to deal with non-linear relationships?

A

Transformation

Quadratic/Polynomial regressions

Splines (smoothing)

29
Q

Smoothing

A

Fitting a curve to data without specifying a formula

30
Q

Limits to terms with polynomial?

A

Sample size should be at least 7 times the number of terms

(i.e. Keep It Simple Stupid)

31
Q

Logistic regression

A

Tests for relationship between a numerical variable (as the explanatory variable) and a binary variable ( as the response)