AP Stat Ch 3 and 12.2 Flashcards

0
Q

Explanatory variable

A

Attempts to explain or influence changes in a response variable.
Independent variable. X axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Response variable

A

Measures an outcome of a study.

Dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which is explanatory variable:

  1. Scuba diving: depth and visibility
  2. World population vs. year
  3. Amount of rain vs. crop growth
  4. Height vs. GPA
A
  1. Depth
  2. Year
  3. Amount of rain
  4. No association
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Scatter plot

A

The most effective way to display the relation between two quantitative variables measured on the same individuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Tips for drawing scatterplot by hand

A
  1. Plot explanatory variable on x axis
  2. Label both axes
  3. Scale the axes with uniform intervals
  4. Make plot large enough to see details
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Four major features in interpreting scatter plots

A

Direction
Form
Scatter
Outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Direction

A

A pattern from the upper left to the lower right is said to have a negative direction. A pattern from lower left to upper right has a positive direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Form

A

Approx linear, curved, exponential…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Scatter

A

Strength of relationship.

Strong to weak on a scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Positive vs negative association

A

Positive when above average values of one tend to accompany above average values of other. Slope is positive.
Negative when above average with one accompanies below average of the other variable. Negative slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Correlation

A

The correlation, r, is a common measure used to numerically asses the association between two quantitative variables. Measures the direction and strength of a linear relationship. On a scale of -1 to 1.
Indicates direction by its sign and strength by how far r moves away from 0.
Obtained from stat menu, Calc, 8.
Don’t need to calculate by hand, but it is sum of the standard deviations of x times the sum of the standard deviations of y divided by n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens as r gets closer to 0

A

Weaker

Stronger as further from zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Properties of r

A
  1. No units
  2. Doesn’t depend on which variable is x and y as product of scores of x times y is same as y times x.
  3. Correlation requires both variables to be quantitative
  4. -1<= r => 1
    When r is greater than zero, relationship is positive.
    When r less than zero, relationship negative
  5. r only =1 or -1 when the data is perfectly linear.
  6. Value of r is a measure of the strength of a linear relationship only. Measures how closely the data fall into a straight line. R value near zero doesn’t indicate no relationshop, but rather, no linear relation.
  7. Not resistant.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Don’t confuse correlation with causation

A

Just because number of students taking stat has increased and murders are down, doesn’t mean that one causes the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Regression line

A

A line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.
Stat Calc 8
LRSL- least squares regression line
Y hat = a + bx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

LSRL form

A

Y hat = a + bx
A is the y intercept
B is the slope
Y hat is used as a prediction of the model
When interpreting the slope, always mention according to the Model, as x increases by one, the y variable is expected to increase by b.

16
Q

Extrapolation

A

Predict based on ref line outside data domain. DO NOT EXTRAOLATE EVER

17
Q

Residuals

A

The difference between an observed value of the response variable and the value predicted by the regression line. The vertical distance from the point to the line.
Y minus y hat

18
Q

What does it mean when residual is pos/neg?

A

When pos, y is greater than y hat. So value above prediction, above LSRL
when neg, y is less than y hat. So value below prediction, below LSRL.

19
Q

Important questions to consider with LSRL

A
  1. Is linear model really appropriate, or would curved model be better
  2. Are there any unusual aspects of the data set?
  3. If we make predictions, how accurate?
20
Q

Residual plot

A

Scatterplot of the regression residuals against the explanatory variable. If there is a pattern then it shows that linear is not the best model

21
Q

If an observation has a positive residual, then…

A

Y minus y hat is positive. So y is above the expected value. So y is above the line. The prediction is too low.

22
Q

If an observation has a negative residual, then…

A

Y minus y hat is negative, so y hat is larger. This means that the predicted value is too high. We are below the predicted value

23
Q

Only way to tell you if a linear model is the best choice…

A

RESIDUAL PLOT PATTERN!

24
Q

Standard deviation about least squares regression line

A

Shows you how close the observation is to the line. The approximate size of a typical or average prediction error (residual).
Represented by S.
If S= 4 UNITS (same as y), shows that the typical deviation from the expected value is 4 units.

25
Q

Coefficient of determination (r squared)

A

r squared is a measure of the proportion of variability in the y variable that can be explained by the linear relationship between x and y.
Also tells u how well the LSRL is at predicting values of y.
Say that r squared = .74 (no units).
74% of the variability in army attririgbted to years of experience. Or that the LSRL is 74% better at predicting y than using the mean y value every time.

26
Q

How to get r from r squared

A

SQRT and then positive if slope is pos and negative if slope is neg.

27
Q

Correlation and regression describe only…

A

LINEAR RELATIONSHIPS

28
Q

Is correlation resistant?

29
Q

Outlier

A

Observation that lies outside the overall pattern of the other observations.

30
Q

Influential point

A

An observation is influential if removing it would markedly change the result of the calculation. Points that are outliers in the direction scatter plot are oftne influential.

31
Q

Influential points and outlier relationship

A

An influential points is always an outlier.

An outlier not always influential.

32
Q

Lurking variable

A

Variable not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables.

33
Q

Summary of chapter 3:

A
  1. Graph data
  2. Generate LSRL
  3. R tells us how well the data fits the line–correlation, desire whether this is the most appropriate model
  4. Residual plot tells us whether a linear model is appropriate.
  5. S tells us our average error if/when we use LSRL to predict Y
  6. R squared tells us how much better our LSRL is at predicting our y value than using the mean y value every time. Also explains the percent of variability in y as x changes
34
Q

Exponential models

A

Take ln or log of just the y variable
Then solve for y hat.
a * b^x

35
Q

Base of exponential function

A

In form a*b^x
B will be positive.
1+/- r
For example, if b=1.057, then y hat is increasing 5.7% for every one increase in x
If b= .79, then y hat is decreasing 21% for every one increase in x
When b is less than one, decrease. Greater is increasing

36
Q

Power model

A

Take log or ln of both x and y
a*x^b
As x increases by 1, y will increase by 1/b