Correlation & Multiple Regression Flashcards
What is correlation?
An association or dependency between two independently observed variables.
What type of graph is used for correlations?
scatterplots
A Pearson correlation coefficient of -1.0 means X and Y are____
exactly inverse to one another
Which measure of association should be used when both variables are interval/ratio e.g temperature ?
Pearson’s coefficient
Which measure of association should be used when both variables are ordinal (rank) ?
Spearman’s/ Kendall’s rank coefficient
Which measure of association should be used when both variables are true dichotomous e.g male/female or yes/no ?
Phi coefficient
Point-biserial coefficient is used when one variable is ____ and the other variable is _____
true dichotomy and interval
If there are more than 2 variables and you want to assess the relationship of one pair after accounting for another pair. What type of correlation is this?
partial correlation
What is multiple linear regression?
Similar concept to correlation.
It describes the relationship between one or more predictor variables and a single criterion variable.
The goal of a regression model is finding the best fit between the model and the observation. This is done by adjusting the value of the _____________ until the prediction error is minimised.
regression coefficient
What is the residual sum of squares?
A statistical technique used to measure the amount of variance in a data set that is not explained by a regression model.
You can assess the goodness of fit of a regression model by using a multiple correlation coefficient (R). What is this a correlation between?
A correlation between the predicted values and the observed values
You can assess the goodness of fit of a regression model by using a coefficient of determination (R^2). This is simply the proportion of ______ explained by the ______.
The proportion of variance measured by the regression model.
F-ratios in ANOVA can be used to assess the goodness of fit of the linear regression model. What does a high F-ratio indicate?
a good model, decreased prediction error
A simultaneous (standard) multiple regression approach is used when ____
no a priori model is assumed and all predictor variables are fit together