Chapter 3) Bivariate Data Flashcards
What is the proper definition to what makes something independent MARKSCHEME
x is independent because x is NOT SUBJECT TO RANDOM VARIATION
How to say why somehting is not an outlier markshcme?
All of the points except this one seems to lie on a straight line/ follow same trend
Thus outleir
What’s difference between random and non random independent variables
What makes a random variable
Thus is when yiu can CONTROL how much you can change something
Like distance time etc = non random
But yiu can have a random independent in like water, can’t control that (rainfall)
2) random is when you cant predict , just measuring
What’s the Difference between CORRELATION AND ASSOCIATION
What is ASSOCIATION
Correlation is a SPECIAL type of Association = linear association
So stringer correlation means closer to a line!!!
Association is a way of describing relationship between two variables, could be quadratic for example
So what’s the point of using scatter diagrams then to start off with
1) can allow us to discard obvious outliers in calculations
2) can tell us the shape of the scatter which determines what procedure to use
- so if its linear we know we can use PMCC as it’s correlation
- but if it’s quadratic might have to use rank instead
-can tell us if the data is roughly eliptical, suggesting it came from normal distribution which means it’s valid for PMCC hypothesis tedt
3) can tell us degree of correlation do we know if our calculated values are correct or not
What’s the MOST important things to watch out for on a scatter diagram
Outliers
- trend
- ROUGHLY ELLIPTICAL , which means we can do a PMCC test in it
If scatter shows two distinct groups that LOOK to be able to draw ellipse around, why to discard it?
Why is it even showing this
Could be height vs IQ, and yiu have a group of 20 year olds who are also taller, will have higher iq, compared to 5 year olds who are shorter and have lower iq
But this DOESNT MEAN height and IQ are necessary related, it just came that’s third variable linked to the distinct feature of the groups (age) was involved
2) thus a case where they would be no correlation but when different groups plotted on same graph appears to be positive , can’t use this so DISCARD TWO DISTINCT
Also how else should you rejects eliptical or a trend?
(Covering one or two data points)
This is why yiu PLOT YOUR POINTS, so you can see if there’s dodgy data!
If there are 1 or 2 points that yiu cover, and now the rest of the data looks RANDOM, then reject this, as these could be outliers, and not valid to just assign a whole trend
Association
The type of way two variabels are related
Linear association = correlation
What does PMCC give you r
A value between -1 and 1 which tells you how string the correlation is
-1 extrem negative, 1 extreme string
From -0.1 to 0.1 that’s NO CORRELATION
From 0.1 to 0.5 weak, 0.5 to 0.8 moderately string
0.8 to 1 is string
1 is PERFECT !
What does data coming from a bivaraite normal distrubtion mean
And how does it look like (what can you do;
It means BOTH SETS OF DATA (bivariate ) came from A NORMAL DISTRUBTION EACH
Hence some are extreme and most in the middle
Thus you can draw an ellipse around the data
- and if yiu can, data = bivariate normal= can be used for PMCC hypothesis test