RMA: WEEK 9 Flashcards
Strength of correlations
- We know a correlation is likely strong when they seem to follow the pattern of a line (doesn’t need to be perfect but must be apparent)
- A correlation might be weak if points do not follow a line but follow a particular direction
- Perfect data is data which follows a straight line with little to no variation > suggests something is wrong in data as it’s not normal to have a perfect line
Linear relationship
- refers to data which follows a line > if so the relationship can be linear
Purpose of a correlation analysis
- determine if there is a linear relationship between variables
- direction of relationship (positive/negative)
- strength of relationship (weak, moderate, strong)
Types of correlation
Positive- When both variables increase or decrease at the same time
Negative- Where one variable increases and the other decreases
Zero-No relationship between co-variables
Correlation coefficients
- Pearson correlation coefficient = r
- Spearmans correlation coefficient = rˢ (rs)
- Corr values are between -1 and 1
- Positive value = positive correlation (e.g: 0.8)
- Negative value = negative correlation (e.g: -0.8)
- if r = 0 there is no corr
Characteristics of correlations
- Correlations contain bivariate data (no IV/DV) so they make no distinction between IV + DV
- R doesn’t aim to establish a difference or effect > aim to find a relationship between variables
- Correlation coefficients won’t change if the unit of measurements of the variables change (e.g: changing from minutes to hours won’t change correlation coef)
Non-linear relationships
- If on a scatterplot the data shows a pattern but this pattern doesn’t go in a line (e.g: in the shape of a curve), the analysis software won’t pick this up > doesn’t mean r=0 but data must be analysed differently
- Relationships can exist which do not go in a line if there seems to be a pattern in behaviour
Correlation coefficient: Pearsons r
- Calculated directly from the raw scores.
- Suitable for data on an interval or ratio scale.
- Highly affected by outliers.
- Not suitable for skewed data.
Correlation coefficient: Spearmans rs
- Calculated from the ranking of the raw scores.
- Suitable for data on an ordinal scale.
- Marginally affected by outliers.
- Suitable for skewed data.
Pearsons r or spearmans rs?
- Pearsons r is more powerful than rs so it is preferable to use > but data needs to meet criteria + be clean w/ no outliers or skewed data
- If data is messy then spearmans rs is more suitable > although it is less powerful, it is better for cluttered data
- e.g: if before an outlier is introduced r and rs = +0.95, we see a similar function. When an outlier is included r= +0.90 while rs= +0.92 > rs is more accurate than r when outliers are present
Problem with sample sizes in r
- Small sample sizes risk the correlation it seems to pose as being due to chance > e.g: N=3:patterns may arise w/o real relationship. N=10: patterns are unlikely to arise w/o real relationshio
- Larger the sample size = greater certainty that relation is real
Density curves
- Density curve is a histogram distribution of scores of participants > mathematical model describing how scores of ppt in pop are distributed
- One type of density curve = a normal distribution
- Density curves are helpful when dealing w/ lots of ppts > Generalising results to the population as general.
Normal distribution
- When the mean median and mode occupy a middle line in the curve- average
- the more real/accurate data points, the better fit to the curve > may ignore outliers
- mean = median = mode < the same value which means curve is symmetrical
Skewed distributions
Positive- scores focus on left and long tail (minority scores) on the right
Negative- scores focus on the the right and long tail (minority scores) on the left
Mode is always at the peak of the curve
Median is always between mode and mean
Mean leans towards the LEFT in NEGATIVE SKEW (score is smaller than median + mode)
Mean leans towards the RIGHT in POSITIVE (score is bigger than median + mode)
this is because it is effected by extreme values + gets pulled to minority result
What we can do with density curves
- Displays overall pattern (shape) of a distribution> always on or above the horizontal axis
- Curves are calculated so they have an area of exactly 1 underneath them.> data under the curve contain 100% of scores from population (all data is in the curve)
- If you know certain values of the model (e.g., mean or SD) > can make predictions about the overall population + carry out calculations > can work out area above or below mean
e. g: if total area above mean was 0.2 that means 20% of ppt achieved above the mean score.