STAT MOD 3: Chapter 4 Flashcards

Describing Bivariate Numerical Data

1
Q

What is a scatter plot? What are the two variables?

A

A point represents combination of two measurements for an individual observation
- for bivariate, numeric
- Explanatory variable (x axis) and response/dependent variable (y axis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the form of a relationship? What are the two types of relationship?

A

Average pattern or form of the scatter plot

Linear: pattern of relationship resembles a straight line
Curved: pattern of relationship resembles a curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is direction? What are the two types of direction?

A

IF linear, you can determine direction of relationship

Positive association: when values of one variable increases as value of the variable increases (move in same direction)

Negative association: when values of one variable decrease as value of the other variable increase (move in opposite directions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is strength of a relationship?

A

when our points follow a pattern (linear or curved) without a lot of scatter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is correlation coefficient (r)?

A

numerical objective measure that indicates
- strength
- direction
of linear relationship between two numeric variables

can range between -1 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you interpret correlation coefficient?

A

Sign of (r) indicates the strength and direction of relationship
- can range between -1 and 1

Between -0.5 and 0.5 = weak
Between -0.8 and -0.5/0.5 and 0.8 = moderate
Between -1 and -0.8/0.8 and 1 = strong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Is the correlation between x and y different from correlation between y and x?

A

The correlation between x and y is the same as the correlation between y and x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What kind of variables does correlation require?

A

Correlation requires that both variables be quantitative

(cannot compute a correlation between two categorical variables or categorical variable and a quantitative variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Does correlation change when doing transformations/conversions between units?

A

Correlation does not change when doing transformations or conversions between units.

This happens because all observations are standardized in the calculation of correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the unit of correlation?

A

The correlation (r) has no unit of measurement—it is just a number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interpreting correlation:

What does positive or negative r indicate?

A

Positive (r) indicates positive association

negative (r) indicates negative association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the range of correlation?

A

(r) is always between -1 and 1

  • values of r near 0 indicate little/no linear association
  • values of r close to -1/1 indicate strong linear association
  • r = -1/1 show that points fall exactly on straight line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does correlation only measure?

A

Correlation only measures the strength of a linear relationship
- curved relationships have a correlation of zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Is correlation affected by outliers?

A

Correlation is a non-resistant measure (affected by outliers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are potential reasons for observed association between explanatory and response variable?

A

1) Causation (best way to establish is through randomized experiment)

2) Confounding variable (there may be causation, but confounding variables make causation hard to prove)

3) Lurking variable (no causation; association can be explained by other variables affecting both explanatory/response)

4) Response variable is causing change in explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a confounding variable?

A

variable that is not main concern of study but may be partially responsible for the observed results

  • causation but tied up with confounding variables
17
Q

What is a lurking variable?

A

variables that affect both x and y variable, causing us to see an association

  • no causation
  • other variables affect both explanatory/response
18
Q

What is a regression line?

A

A straight line that describes how values of a response variable (y) are related on average to values of explanatory variable (x)

  • used to estimate average value of y at specific value of x
  • used to predict unknown value of y for an individual, given individual’s x value
19
Q

What are components of regression line/equation?

A

y hat = a + bx

20
Q

What is a?

21
Q

What is b?

A

slope
- the amount that the y variable changes when x increases by one unit

22
Q

What does it mean if the slope of regression linen is positive? What if it’s negative?

A
  • When slope is positive, direction of relationship is positive (y increases as x increases)
  • When slope is negative, direction of relationship is negative (y decreases as x increases
23
Q

What is residual?

A

observed y value - predicted y value (using regression equation)

-

24
Q

Interpreting residuals:

What does a positive residual mean?

A

data point falls above regression line

prediction was an underestimate of the observed value

25
Interpreting residuals: What does a negative residual mean?
indicates that data point falls below regression line overestimate of the observed value
26
Why is the regression line we use called least squares regression?
Want a line that comes as close as possible to the points, so we use least squares regression coefficients because they minimize the sum of the squared of the squared residuals
27
What is correlation of determination (r^2)?
the proportion of variation in the y variable explained by the x variable
28
What are characteristics of r^2?
- range between 0 and 1 - high r^2 is better (indicates that a large proportion of the variability in y can be explained by the approximate linear relationship between x and y - near 0 means that x doesn't tell us much about y - near 1 means that x tells us a lot about y
29
How to compute r^2 from r?
square r
30
How to compute r from r^2?
square root r^2
31
What is standard error (Se)?
the typical amount an observation deviates from the least squares regression line - better to have smaller Se as it indicates that residuals tend to be very small (how much accuracy you can expect when using the least squares regression line to make predictions)
32
Locate regression values on Excel output (slope, intercept, r, r 2 , s e)
slope - bottom of coefficients table intercept - intercept or above slope on coefficients table r ~ multiple r r^2 ~ correlation of determination Se - standard error
33
What is the purpose of a residual plot?
a scatter plot of the (x, residual) pairs that can indicate potential problems
34
How does a residual plot evaluate whether a linear regression model is appropriate for the relationship between two variables?
- Curves indicate the data does not follow a linear form - Fanning indicates that residuals are not independent of x-values
35
What is extrapolation?
using a regression line to predict y-values for x-values outside the observed range of the data - making predictions far outside given x values
36
What is the risk of extrapolation?
Riskier the farther we move from the range of the given x-values because there is no guarantee that the relationship will have same trend outside the range of x-values observed