Stats Flashcards
Bivariate Definition
- It means data with two variables, e.g. a graph of life expectancy plotted against birth rate for a country.
Dependant Vs Independent Variables (with the example of weight of crop yielded vs amount of rainfall)
- The Dependant Factor’s outcome is reliant on the Independent Variable.
- Obviously, the weight of crop yielded is dependant on the amount of rainfall and not the other way around.
- Therefore, the weight of crop yielded is dependant, the amount of rainfall is independent.
Random, Non-Random, and Control Variables
- Random variables cannot be predicted, they are independent from anything else.
- Control variables are non-random, and they are changed at regular intervals of your choice. (e.g. time)
Correlation and Linear
- Correlation is essentially how close to a straight line points of data are.
- Linear just means in a straight line.
Correlation Vs Association
- Correlation (linear association) is about how close data is to lying on a straight line (strictly linear).
- Association is about how closely related two variables of data are.
-Therefore, correlation is a type of association, as it describes how close two variables of data are (to being linear).
PMCC
- Stored as the value R, used to describe the strength of correlation (between -1 to 1).
No Correlation: Between -0.1 to 0.1
Perfect Negative/Positive Correlation: -1 / 1
Weak Negative Correlation: -0.1 to -0.5
Moderate Negative Correlation: -0.5 to -0.8
Strong Negative Correlation: -0.8 to -1
Weak Positive Correlation: 0.1 to 0.5
Moderate Positive Correlation: 0.5 to 0.8
Strong Positive Correlation: 0.8 to 1
How do you know if data has a normal distribution?
- It has a roughly elliptical shape on a scatter graph.
Random Variable Definition
- A variable whose value is a numerical outcome of a random phenomenon.
- Denoted with a capital letter, e.g X
- The probability distribution of X tells us the possible values of X.
- Can be discrete or continuous.
Cohen’s Interpretation for interpreting effect size (PMCC)
Small Effect Size (r ≈ 0.1):
The relationship between two variables is weak.
Medium Effect Size (r ≈ 0.3):
The relationship between two variables is moderate.
Large Effect Size (r ≈ 0.5):
The relationship between two variables is strong.
Scenarios where we would use Spearman’s Rank over PMCC
- Deals with subjective data.
- Deals with non-numerical data (e.g. A, B, C)
- Deals with non-linear corelation unlike PMCC.
Interpreting Spearman’s Rank Correlation Coefficient
- It is interpreted in the same as PMCC, from -1 to 1.
- 0.8 for example, would represent a strong positive association (not correlation).
How to double check spearman’s rank
- After making the two rows for ranks, so long as they are correct, inputting those values and finding the PMCC should give you the same coefficient.
What do you do multiple pieces of data has the same rank?
- Average the ranks that they would take if they were different values.
- If you have 7, 7, 7 at the start of the rankings, they would occupy the 1st, 2nd, and 3rd rank if they were different values.
- Add the ranks, and average, 1 + 2 + 3 = 6/3 = 2, therefore they are all given the rank 2.
- The next piece of data is given the next rank if they were all different values, so here 4th.
Issue with PMCC hypothesis testing
- When using large data samples, the critical value is so low that even an incredibly small PMCC suggests a correlation, when the correlation in reality is too weak to consider.
Example of two tailed PMCC hypothesis test
- Let p (rho) be the population correlation coefficient between exam scores and heights.
H0: p = 0
H1: p != 0 - Find critical value in table.
- Compare PMCC and value (it must be greater than if positive, or less than if negative to reject H0).
Example, -0.35 > -0.7:
Result is not significant, so we fail to reject H0/so we reject H0, as there is insufficient/there is evidence to suggest that there is correlation between X and Y.