Chapter 3: Exploring Two-Variable Quantitative Data Flashcards

1
Q

Define Response Variable

A

a quantitative variable that measures the outcome of a study. It is the y-variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Explanatory Variable

A

a quantitative variable that tries to predict or explain changes in the response variable. It is the x-variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define Scatterplot

A

a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual is a point (x,y) on the graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describing a Scatterplot

A

DUFS + context

  • Direction (positive/negative/no association)
  • Unusual Features (points that don’t fit the overall pattern or are distinct clusters
  • Form (linear/non-linear)
  • Strength (strong/moderate/weak)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define Correlation

A

measures the strength and direction of a LINEAR association

symbol = r (equation on the formula sheet)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Facts about Correlation [r]

A
  1. -1 ≤ r ≤ 1
  2. indicates direction by its sign
  3. r = ±1 = perfect linear relationship; r = 0 = weak.
  4. both variables must be quantitative to calculate r
  5. does not matter which variable is x or y
  6. does not rely of units of measure
  7. has no unit of measurement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cautions about Correlation [r]

A
  1. correlation ≠ causation
  2. does not measure form
  3. should only be used to describe linear relationships
  4. not a resistant measure of strength
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define Regression Line

A

a line that models how a response variable changes as an explanatory variable changes

(predicted y) = a + bx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define Extrapolation

A

using the regression line to predict outside the x-values that were used to calculate the line.
extremely unreliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define Residual

A

the difference between he actual value of y and the predicted value of y.
- (+) = actual point is above the regression line; under prediction
- (-) = actual point is below the regression line; over prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interpretation of Correlation [r]

A

The linear association between __x-context__ and __y-context__ is __weak/moderate/strong (strength)__ and __positive/negative (direction__

ex. the linear association between __student absences__ and __final gradeS__ is __fairly strong__ and __negative__. (r=-0.93)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Interpretation of Residual

A

The actual __y-context__ was __residual__ __above/below__ the predicted value when __x-context = #__.

ex. the actual __heart rate__ was __4.5 beats per minutes__ __above__ the number predicted when __Matt ran for 5 minutes__.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interpretation of Y-Intercept

A

The predicted __y-context__ when __x=0 context__ is __y-intercept__.

  • have to make sense when x=0. When it does not, write “does not make sense when x=0”

ex. the predicted __time to checkout at the grocery store__ when there are __0 customers in line__ is __72.95 seconds__.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Interpretation of Slope

A

The predicted __y-context__ __increases/decreases__ by __slope__ for each additional __x-context__.

ex. the predicted __heart rate__ __increases__ by __4.3 beats per minute__ for each additional __minute jogged__.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Interpretation of Standard Deviation of Residuals [s]

A

The actual __y-context__ is typically about __s__ away from the value predicted by the LSRL.

ex. The actual __SAT score__ is typically about __14.3 points__ away from the value predicted by the LSRL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Interpretation of Coefficient of Determination [r^2]

A

About __r^2 %__ of the variation in __y-context__ can be explained by the linear relationship with __x-context__.

ex. About __87.3 %__ of variation in __electricity production__ is explained by the linear relationship with __wind speed__.

17
Q

Define Least-Squares Regression (LSR) Line

A

the line that makes the sum of the squared residuals as small as possible

18
Q

Define Residual Plot

A

a scatterplot that shows the residual as the y-value and the explanatory as the x-variable.
- if X pattern, the regression model is good
- if O pattern, the regression model is not good–> diff form of regression model is needed

  • menu - 4 - 7 - 2
19
Q

Other measures for how well the data fits the LSR Line

A
  1. standard Deviation of the Residuals, s
  2. Coefficient of Determination, r^2
20
Q

Define Standard Deviation of the Residuals, s

A

measures the typical distance between the actual y-values and the predicted y-values.

equation on the formula sheet

21
Q

Define Coefficient of Determination, r^2

A

measures the percent reduction in the sum of squared residuals when using the LSR line to make predictions, instead of just using the mean of the y-values.

R^2 is the percent of the variability I the response variable (y in context) that is accounted for by the LSR line.

22
Q

Calculate LSR Line

A
  1. use b equation –> find b
  2. use y bar equation –> find a
  3. use y hat equation –> plug in
23
Q

Facts about the LSR Line

A
  1. distinction between the explanatory and response variables is essential
  2. close connection between the correlation and slope of the LSR Line (same sign)
  3. slope always passes through (x bar, y bar) -> (y hat - y bar) = r*Sy/Sx(x - x bar)
  4. LSRL are not resistant measure
24
Q

Define Outlier in regression

A

a point that does not follow the pattern of the data and has a large residual

25
Define High Leverage in regression
have much larger or much smaller x-values than the other points in the data set
26
Define Influential Point in regression
any point that, if removed, substantially changes the slope, y-intercept, correlation, coefficient of determination, or standard deviation of the residuals
27
Best way to investigate the influence of such points [outliers and high-leverage]
outliers and high-leverage = influential in regression calculations do regression calculations with and without them to see how much the results differ.
28
4 types of the Outliers, High Leverage, and Influential Points
1. outliers 2. outlier, influential 3. high leverage 4. high leverage, influential
29
Transforming to Achieve Linearity
1. Linear = x vs. y 2. Power = log (x) vs. log (y) or ln (x) vs. ln (y) 3. Exponential = x vs. log (y) for numbers 2 and 3, do 10^# to find y hat.