Chapter 8: Regression Wisdom Flashcards

1
Q

Define ‘Outlier’.

A

Any data point that stands away from the others can be called an outlier. In regression, cases can be extraordinary in two ways: by standing far apart in the y-direction and having a large residual or by standing far apart in the x-direction and having high leverage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define ‘Leverage’.

A

Data points whose x-values are far from the mean of x are said to exert leverage on a linear model. High-leverage points pull the line close to them, and so they can have a large effect on the line, sometimes completely determining the slope and intercept. With high enough leverage, their residual can be deceptively small.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define ‘Influential point’.

A

A point that, if omitted from the data, results in a very different regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define ‘Extrapolation’.

A

Although linear models provide an easy way to predict values of y for a given value of x, it is unsafe to predict for values of x far from the ones used to find the linear model equation. At times, such extrapolations may pretend to see into the future, but the predictions should not be trusted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define ‘Lurking variable’.

A

A variable that is not explicitly part of a model but affects the way the variables in the model appear to be related. Because we can never be certain that observational data are not hiding a lurking variable that influences the relationship between x and y, it is never safe to conclude that a linear model demonstrates a causal relationship, no matter how strong the linear association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Be careful when you interpret regressions based on summarized data or with restricted data ranges. What do they due to regressions?

A

Regressions with summaries tend to look stronger than the regression based on all the individual cases, whereas restricting the ranges of variables usually makes regressions weaker.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly