Correlation Analysis Flashcards
1
Q
Correlation Analysis
A
- trying to understand if two sets of data are associated with each other
- data points are close to the line, they are strongly associated
- scale of -1 to 1
2
Q
Pearson Correlation Coefficient
A
- Measures the degree of LINEAR association b/w two variables
- Positive correlation reflects a tendency for a high variable to be related to another high variable
- Negative correlation reflects a tendency for a high variable to be related to a low variable
3
Q
Population Correlation (p)
A
- measure based on the population
4
Q
Sample Correlation (r)
A
- measure based on a sample
5
Q
4 Rules for Interpreting Correlation Coefficients
A
- Correlation does not imply causality
- Correlation can be influenced by the size of the sample
- Visually examining the scatter plot will tell you more than the correlation coefficient
- How to determine what a “good” coefficient is
6
Q
The smaller the sample…
A
the higher chance of creating a meaningless coefficient
7
Q
To create a meaningful coefficient…
A
- large sample
- variability
8
Q
Linear vs Non-Linear
A
Pearson’s Correlation Coefficient is for linear associatons
– for non-linear relationships it will show a coefficient close to zero
9
Q
Lack of Equal Variability
A
- Pearson’s is describing the average strength of the relationship between x and y
- watch for regions of higher/lower variability
10
Q
Discontinuous Distributions
A
- running analysis without looking at data could create issues when you have outliers
- can lead to misleading correlations
11
Q
Deciding what is a “good” correlation
A
- build confidence with our “r”
- the larger the r the more likely the correlation coefficient is meaningful