Lecture 6 - Correlation Flashcards

1
Q

what percentage of marathon runners finish in under 5.5 hours if the mean finishing time is 4.5 hours with a standard deviation of 1 hour?

A

Using a z-table, a z-score of 1 corresponds to the 84th percentile, meaning 84% of runners finish in under 5.5 hours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the 80th percentile of marathon finishing times with a mean of 4.5 hours and a standard deviation of 1 hour?

A

Using a z-table, the 80th percentile corresponds to a z-score of approximately 0.84. Converting this to a raw score:
X=z×σ+μ=0.84×1+4.5=5.34hours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

why are many things normally distributed?

A

Variation will tend to follow a normal curve when:
- the variable’s state is affected by many influences
- the influences are separate from each other (not associated)
- the range of possible variation is unlimited
these conditions are described more precisely by the “Central Limit Theorem”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the properties of a true normal distribution?

A

In a mathematically defined normal curve”
- mean, median and mode are identical
- the curve is symmetrical around the center
- tails are indefinitely long
- tails approach zero while never reaching it (asymptotic approach)
Improbability increases at an increasing rate with distance from the mean
- the area under the curve is finite/limited

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the difference between variation in a single variable and associations between pairs of variables?

A

variation in a single variable involves understanding the spread of scores within that variable, which informs estimates of probability, causes of variation, and descriptive statistics. Associations between pairs of variables, involve understanding the probabilistic relationship between two variables, helping to guess the state of one variable based on the other and making causal inferences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Give an example of an association between two variables

A

an example of an association is between regular exercise and weight loss. Knowing someone’s level of regular exercise can help predict their weight loss.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are the steps involved in data cleaning before analyzing associations?

A
  1. enter the data into the spreadsheet carefully
  2. Check each variable for impossible scores:
    a. determine the range of possible scores for
    each variable
    b. delete any impossible scores
    c. replace deleted values with correct ones if
    the data source is available
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

define correlation and explains its direction and strength

A

correlation typically refers to a linear association between two continuous variables (ordinal or better) and quantifies the strength of this association. The direction can be positive (both variables increase together) or negative (one variable increases while the other decreases).
The strength of the correlation indicates how precisely you can predict one variable’s state from the other, quantified by the correlation coefficient r, which ranges from -1 to 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how is the correlation coefficient r calculated?

A

One way to calculate r involves:
1. multiplying z-scores for each data point
2. summing those products
3. dividing by the number of scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Given the following scores for two variables X and Y, calculate the correlation coefficient r:

A

refer to photo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does a scatterplot of a perfect positive correlation look like compared to a scatterplot with no correlation?

A

a scatterplot of a perfect positive correlation will show all points lying on a straight line with a positive slope, whereas a scatterplot with no correlation will have points spread out with no discernible pattern, and the line of fit will be flat (horizontal).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly