Lecture 3 - Quantitative Methods Flashcards
Difference b/w chi-square and regression analysis
Þ Chi-square analysis looks at the association between two categorical
variables.
Þ Regression analysis looks at predicting one continuous variable from another
What is a correlation?
the association b/w 2 continuous variables
Correlation is a form of. . .
BIVARIATE analysis (measure of relationship b/w 2 variables)
Correlation quantifies the relationship (typically linear) between two variables X & Y
in terms of direction, degree.
An association b/w 2 variables can be. . .
LINEAR or NON-LINEAR
What does the Pearson correlation coefficient (r)?
measures the linear association between two
continuous variables.
It compares how much the two variables vary together to how much they vary separately.
It ranges from -1 to 1. A r value of 0 indicates no association between 2 variables.
R squared (r^2) is. . .
the percentage of variance accounted for (i.e. how much of the variation in the outcome variable is explained by the predictor)
What is variability?
how much a given variable varies from observation to observation
What is covariability?
how much two variables vary together
Degree of relationship
® Small effect: 0.1 < r < 0.3, or -0.3 > r > -0.1
® Medium effect: 0.3 < r < 0.5, or -.3 > r > -0.5
® Large effect: 0.5 < r < 0.7, or -0.5 > r > -0.7
What can greatly influence the value of a correlation?
extreme scores / outliers
Pearson’s r is. . .
BIDIRECTIONAL
- meaning, the correlation b/w Variable B and Variable A is the same as that b/w Variable A & B
Regression toward the mean:
® With imperfect correlation, an extreme score on one measure tends to be followed
by a less extreme score on the other measure.
® This is because extreme scores are often due to chance. And if it’s due to chance, it’s
extremely unlikely that the other value will also be extreme.
Null hypothesis testing
we assume there is no effect (i.e. no association)
The null hypothesis –
ρ = 0. Rho (ρ) is the correlation in the population
® If the probability of finding an r this big if the real association in the population (ρ) is small (p < .05), we reject the null hypothesis. So, we infer that the correlation value for the population is NOT zero. There is significant association between the two
variables.
ρ will depend on. . .
the size of the sample (n)
it will also depend on
the degrees of freedom (df).
® For correlation, df = n – 2.
® For a small correlation to be significant, a high df (lots of participants/data) is
required.
When is Spearman’s correlation coefficient used?
- when the data is ORDINAL (ranked) & when the data is one-directional but NON-LINEAR
- Convert the data to ranks before calculating correlations. Converting to ranks can linearize non-linear data.
- Useful for non-linear data and data sets with outliers.
What is regression used for?
- predicting one variable from another
- However, unless the correlation is perfect (r = +1 or r = -1), the prediction will not be
exact. - Looks for the line of best fit (regression line)
Errors in regression are assumed to be
- independent
- normally distributed (w a mean of 0)
- homoscedastic (equal error variance for levels of predicted Y)
How is the assumption of the independence of errors assessed in regression?
- by reflecting on the sampling procedure for the study (and NOT by looking at plots of residuals)
What does the least squares parameter estimate?
estimate parameters by minimizing the total squared error (regression coefficients)
What does the ANOVA (F test) for aggression tell us?
- F test tells us whether the variance explained is significantly different from zero
- if the F test is NOT significant, the regression is worthless. The predictor does not explain the outcome variable at all
® Null hypothesis: r^2 = 0.
® The F test has 2 degrees of freedom. For simple regression, the first is always 1, the
second df is the same as for the earlier t-test of regression coefficients.
Regression parameters (a, b)
® The intercept (a) is the estimated value of Y when X = 0.
® Slope (b) (gradient) (the standardized/regression coefficient in JASP) indicates
whether there is a relationship between X & Y, whether that relationship is positive
or negative, and the estimated change in Y when X increases by 1. For H0: b = 0
® Slope can be transformed into standardized form (covert X & Y into z-scores, and
then do the regression). This is called a standardized regression coefficient (called
beta in SPSS/JASP), and is the same as the correlation coefficient for bivariate data.
Regression diagnostics
a) Histogram of residuals – normality: we want the errors to be approximately
normally distributed
b) Residuals plot – homoscedasticity: a scatterplot of residuals against predicted values
to check for heteroscedasticity. Absence of any systemic pattern supports the
assumption of homoscedasticity (where the variance of the residual/error term in a
regression model is constant).