Exploratory Data Anyalsis 6.4 Correlations Flashcards

1
Q

What is correlation?

A

Correlation is a statistical measure that describes the relationship between two variables. It tells us whether and how strongly the two variables are related to each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the correlation coefficient?

A

A single number that ranges from -1 to 1, summarizing the strength and direction of a correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a correlation close to -1 indicate?

A

A strong negative relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does a correlation of 0 indicate?

A

No relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does a correlation close to 1 indicate?

A

A strong positive relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Pearson’s correlation?

A

Pearson’s correlation measures how strongly two variables are related and whether the relationship is positive or negative. It tells you if an increase in one variable is associated with an increase or decrease in another.

Value Range: -1 to 1
Positive Correlation (r > 0): Both variables increase together

Example: Height & Weight (taller people tend to weigh more)

Negative Correlation (r < 0): One increases, the other decreases

Example: Study Time & Video Game Time (more studying, less gaming)

No Correlation (r ≈ 0): No relationship

Example: Shoe size & IQ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Spearman’s correlation?

A
  • Unlike Pearson’s correlation, which checks for a straight-line relationship, Spearman’s looks at ranked data.
  • It’s useful when data isn’t normally distributed or has outliers because it focuses on order rather than exact values.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When should Spearman’s correlation be used?

A

Use Spearman’s correlation when:

  • The relationship between two variables is not linear but still follows a pattern.
  • Your data has outliers that could affect Pearson’s correlation.
  • Your data is not normally distributed (e.g., skewed or ranked).
  • You’re working with ordinal (ranked) data, like survey ratings (e.g., 1st, 2nd, 3rd place)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a correlation plot?

A
  • A tool in Exploratory Data Analysis that calculates the correlations between all variables in a dataset.
  • A correlation plot is a visual representation that shows the relationship between multiple variables. It uses colors or values to display how strongly pairs of variables are correlated with each other.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does correlation not imply?

A

Causation. Correlation does not mean that one variable causes the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is an example of a misleading correlation?

A

The flow of water in a stream and the amount of water in a puddle may be correlated, but both could be influenced by rainfall.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Anscombe’s Quartet?

A
  • A set of four datasets that have the same statistical properties but different distributions, illustrating how correlation can be misleading.
  • data visualization is essential for understanding data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly