Bivariate Data Flashcards
What is PMCC
Product moment correlation coeffeicmt allows you to measure how string the correlation on a scatter diagram is
Dependent vs independent variables
Independent variable is the thing you change
Dependent variable is the thing you measure
Traditionally we plot dependent variable on y axis and independent on x axis
Random non random and control variables
Random variables = anything that CANT be predicted
Non random = things we can predict
Control = things that stay the same but you adjust to see effect (isn’t this independent lol)
Control = non random
But this caries per graph , control is non random, random can be indepednet
WHY MUST THE DATA IN A SCATTER GRAPH be “roughly Eliptical”?
What does this allow you to do
If it’s roughly eliptical it means it prolly came from a normal distribution
And thus if it is roughly eliptical it means you can perform a PMCC statsical hypothesis test!
This is what makes the TEST VALID
What to look out for when determining if data is eliptical if it looks elipticsl
- is it two islands? Because on their own, data isn’t eleiptifal
- is it only eliptical bevause if outliers!
Basically if only a few days points making it look eliptical, remove them and double check, if it looks no corellatioj now, then don’t use
Difference between association and correlation?
Can something be curved and association ?
Association is the comparison between two variables
Correlation is a SPECIAL a type of association called LINEAR ASSOCIATION, which means how close are they to a straight line
2) however can Herve curved non linear associations too
PMCC , what is it, and where are correlations on number line
(Say 0.67, what strength is this, -0.2)
Is a measure of how string correlation
Mod 0.1 is NO CORREALTION
Mod 0.1 to 0.5 is WEAK
0.5 to 0.8 MODERATE
0.8 to 1 is STROMG
1 is PERFECT CORRELATION
0.67 = moderate, -0.2 is weak
How does a normal distribution curve look like
THUS HOW DOES A BIVARIATE NORMAL DISTRIBUTION data points form an ELIPTICAL SHAPE
1) normal distribution will be a bell curve shape, as heights most people will be in thr middle, some really talk some less
2) if you plot them next to each other, then majority becomes in middle, with some on tails, thus forms an elipse
Thus if roughly eliptical = can say it’s roughly bivariate normal
Why do we care if data is bivariatkey normal?
This is a condition needed to do a PMCC statistical hypothesis test
How to do PMCC
Just use alternaitve formula, and sample statistics and get value