16- Association and Correlation II Flashcards

1
Q

What is Covariation?

A

means “Varying together” or “Varying jointly”

it is the cross product for each observation of deviations from means an

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Are signs important for Covariance?

A

Yes, they tell us if it is a high or low covariation as well at the direction (positive or negative).

1) Linear points with positive slope = High positive covariation
2) Lin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Characteristics of Covariation

A

1) Magnitude of covariance is dependant on the units of x and y 2) Mutliple pairs of variables are not directly comparable 3) To be able to compare multiple pairs you would need a standard form of covariance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Correlation?

A

A dimensionless measure that indicates the nature and degree of linearity between two variables

It is a standardized form of covariance, ranges from values between -1 t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the Correlation ranges for interpretation?

A
  1. High Perfect +ve = +1.00
  2. Moderate +ve = +0.50
  3. Weak +ve = +0.25
  4. No Correlation = 0.00
  5. Weak -ve = -0.25
  6. Moderate -ve = -0.50
  7. High Perfect -ve= -1.00
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Pearson’s Correlation Coefficient?

A

The covariance of x and y divided by the product of the population standard deviations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Purpose of Pearson’s Correlation Coefficient

A

to help quantify associations that we see on a scatter plot/chart or map

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Assumptions of Pearson’s Correlation Coefficient

A
  1. Variables must be interval or ratio
  2. Data pairs must be selected randomly from the population
  3. Relationship between X and Y is linear
  4. Constant Variation (homoscedasticity)
  5. The variables X and Y must share a joint bivariate normal distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Pearson’s Correlation Coefficient Values

A
  1. Values -1 and +1 correspond to a strong negative and strong positive linear relationship between the variables X and Y
  2. Value of 0 indicates no linear relationship exists between X and Y ( they are independent)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Spearman’s Correlation Coefficient

A

Computes the linear correlation on the ranks of xy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Properties of Spearmans Correlation Coefficient

A
  1. Relaxes normality and linearity assumption
  2. Data can be ordinal
  3. Measures the difference between the ranks
  4. Coefficient is interpreted the same way as Pearson’s (value of 0 = No association)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Kendalls Correlation

A

measures the strength of the monotonic relationship between X and Y

It is resistant to the effect of a small number of outliers and is used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When to use Spearmans?

A

When…
* Data Types: analyzing relationships between ordinal, interval and ratio variables
* Assumption of Monotonicity: robust against non-linear relationships however Spearmans is well-suited for detecting monotonic relationships, including both positive and negative monotonic trends
* Larger Sample Sizes: more powerful and efficient when you have larger sample sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When to Use Kendall’s?

A
  • Data Types: when data is strictly ordinal
  • Ties in Data: effective in handling tied values (when two or more data points have the same rank)
  • Smaller Sample Sizes: works well with limited data
  • When you Want to Emphasize the Relative Order: assesses association based on the orfer or ranking of data points rather than the actual values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Possible Issues with Correlations

A
  • Non-Linear Associations
  • Correlation does not equal causation
  • Spatial Aggregation Impacting Analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Problem: Non- Linear Association

A

Occurs when data is clearly associated but not in a linear fashion

solution: to use a transformation on one or both values and then calcula

17
Q

Problem: Correlation Does Not Equal Causation

A

Association may be the result of interdependency (autocorrelation); happens when there is a correlation of a random variable with itself

18
Q

Problem: Spatial Aggregation

A

At indiviual level, there may be a high correlation between two variables but at a provinical level this association becomes weak or non-existant

19
Q

Recap: Correlation

A
  • Pearson’s R is most widely used and most useful for linear relationships
  • Spearman’s can be used as a non-parametreic equivalent to Pearson’s
  • Use Kendall’s when dealing with monotonic, non-linear trends
  • Use chi-squared test and Cramer’s V for nominal data presented in a contingency table
  • Always plot data before calculating correlations