statistical analysis and design Flashcards
what is correlational research?
= non-experimental study which determines the relationship between two variables without manipulating them or controlling for extraneous variables
why would we use correlational studies?
- when it would be unethical and harmful to manipulate variables
- allows the researcher to observe natural variations
what is a key limitation of correlational studies?
- ## correlation does not equal causation
- the third variable problem -> two variables can be statistically related but not because they cause each other but because a third variable causes both of them.
what are the two characteristics of a correlational relationship
- the direction:
positive correlation= both variables increase together
negative correlation= one variable increases and the other decreases - the strength
- measured using covariance and correlation coefficient (r)
- positive covariance= both variables tend to increase or decrease together
- negative covariance= when one variable is high, the other tends to be low
how to calculate the sample covariance of two variables
covariance = a measure of how the two variables change together.
limitations= affected by units of measurement - indicates direction but not strength
step 1= for each participant, subtract their value from the mean, for both variables, and multiply them together, giving one value for each participant
step 2= sum all the values created in step 1 together
step 3= divide by the number of pairs of observations minus 1
if covariance is + -> positive relationship
if covariance is - -> negative relationship
how do you standardise covariance across different units of measurement and what is the formula for this?
(makes the covariance easier to compare between different pairs of varia
= transform covariance into a correlation coefficient (r)
- indicates strength and direction
Range: -1 to +1
Interpretation:
|r| ≈ 0.1-0.3 → Weak
|r| ≈ 0.4-0.6 → Moderate
|r| ≈ 0.7-0.9 → Strong
1 = perfect
calculated by:
the covariance calculated/ the SD of variable 1 multiplied by the SD for variable 2
null hypothesis significance testing in correlational analysis
Null Hypothesis (H₀): No correlation (r = 0).
Alternative Hypothesis (H₁): Significant correlation (r ≠ 0).
- when stating a directional hypothesis, only state direction, not the strength (strong, moderate, weak)
what are the assumptions of a Pearson correlation as a parametric test?
- levels of measurement:
- interval or ratio (continuous) variables - related pairs
- each observation has two paired values - linearity
- relationship between variables should be linear
- Check with residuals vs. fitted plot (flat red line = linear). - normality
- residuals are normally distributed
- Check with Q-Q plot (values close to diagonal). - homoscedasticity
- the variability or spread of one variable remains constant across the range of another variable
- Check with scale-location plot (flat red line). - absence of outliers
- can distort results
what can we use to visualise relationship between two continuous variables?
a scatterplot
how to calculate degrees of freedom for Pearson correlation
df = N-2
- one degree of freedom for each variable
Spearman correlation as a non parametric test: what is it and when do we use it?
= calculates the relationship based on the rank order of the data, rather than the actual values.
we use when:
- the data is ordinal (ranked 1,2,3,4,5)
- the data violates assumptions of Pearson correlation
- relationship between variables is non-linear
- the residuals are not normally distributed
what are the steps to calculate the spearman correlation coefficient (rs)
- rank the scores for each variable separately
- Smallest value gets rank 1, next smallest gets 2, etc. - calculate the differences between ranked pairs (not the original scores) = d
-( variable 1 ranked score - variable 2 ranked score) - square the differences = d2
- sum/ add the squared differences
- plug values into Spearman’s formula, where n = number of observations
- used when there are no tied ranks
how to handle tied observations Spearman correlation datasets
- when two or more observations have the same value, we have ties in the data(difficult to rank) -> need to calculate tied rank
- identify what ranks that a tie of two observations shoud get naturally
- assign the average of the tied ranks
for example: If two values would be ranked 3 and 4, assign both 3.5.
if more than two values, add all natural ranks together and divide by how mnay tied ranks there are
- do this for all observation sets that are tied
Tied ranks slightly affect standard Spearman’s formula.
Alternative approach: Apply Pearson correlation on the rank scores.
compare independent and dependent correlation coefficients
- independent correlations = Two correlations come from different, unrelated groups
eg -> Comparing correlation between TikTok usage & GPA for high school vs. college students. - dependent correlations= Two correlations share a common variable.
eg -> Comparing correlation between study hours & statistical anxiety vs. study hours & attitude towards statistics (common variable: study hours).
describe hypothesis testing for correlation coefficients
Null Hypothesis (H₀): No difference between the two correlations.
Alternative Hypothesis (H₁): A statistically significant difference exists.
Use ‘cocor’ package in R to compare correlation coefficients.
the r critical value
= is the smallest r-value you need to find a significant effect
- if the r value that yiu have calculated is equal to or greater than the r critical value at a particular significance level, then your test is significant
- don’t take into account polarity (-/+)
the difference between strength and direction in a relationship
Direction= Indicates whether the relationship is positive or negative.
Strength = Indicates how closely the two variables follow a linear pattern.
Measured by correlation coefficient (r)
A strong correlation means data points are close to a straight line, while a weak correlation means they are more scattered.
example
r = 0.9 -> direction = positive, strength = strong
cohens rule of thumb for effect size
small = 0.1
medium = 0.3
large = 0.5
- not the same as correlation coefficient values where 0.5 would be considered moderate and 1 as perfect (-1-1)
what is the difference between correlation and regression studies
Correlation: Measures the strength & direction of the relationship between two variables.
Example: Is ease of purchase correlated with purchase intention?
Regression: Determines whether one variable predicts another.
Example: Does ease of purchase predict purchase intention?
Key distinction: Correlation does not imply causation, while regression models predictive relationships.
describe the variables in regression analysis
predictor variable:
- the independent variable
- the explanatory variable
- the x variable in the regression model
- eg each of purchase
outcome variable:
- the dependent variable
- the criterion variable
- the y variable in the regression model
- eg purchase intention
what is regression analysis
= A statistical technique that models relationships between variables.
Answers: “By how much will Y change if X changes?”
Types:
Simple Linear Regression: One predictor (X) prediciting → One outcome (Y).
Example: Does ease of purchase predict purchase intention?
Multiple Linear Regression: Two or more predictors (X1, X2, etc.) predicting→ One outcome (Y).
Example: Do ease of purchase and influencer endorsements predict purchase intention?
the mean model
used to make predictions in regression
mean model:
Predicts the outcome (Y) using the mean of all responses.
Ignores the predictor variable.
Example:
- Model predicts that the outcome variable(purchase intention) will always be the mean value, independent of the predictor variable value (ease of purchase)
- however the actual purchase intention is often higher or lower than the mean value
- this model is not good as it does not capture the data well