Correlation & Multiple Regression Flashcards
What is correlation?
An association or dependency between two independently observed variables.
What type of graph is used for correlations?
A Pearson correlation coefficient of -1.0 means X and Y are____
exactly inverse to one another
Which measure of association should be used when both variables are interval/ratio e.g temperature ?
Pearson’s coefficient
Which measure of association should be used when both variables are ordinal (rank) ?
Spearman’s/ Kendall’s rank coefficient
Which measure of association should be used when both variables are true dichotomous e.g male/female or yes/no ?
Phi coefficient
Point-biserial coefficient is used when one variable is ____ and the other variable is _____
true dichotomy and interval
If there are more than 2 variables and you want to assess the relationship of one pair after accounting for another pair. What type of correlation is this?
partial correlation
What is multiple linear regression?
Similar concept to correlation.
It describes the relationship between one or more predictor variables and a single criterion variable.
The goal of a regression model is finding the best fit between the model and the observation. This is done by adjusting the value of the _____________ until the prediction error is minimised.
regression coefficient
What is the residual sum of squares?
A statistical technique used to measure the amount of variance in a data set that is not explained by a regression model.
You can assess the goodness of fit of a regression model by using a multiple correlation coefficient (R). What is this a correlation between?
A correlation between the predicted values and the observed values
You can assess the goodness of fit of a regression model by using a coefficient of determination (R^2). This is simply the proportion of ______ explained by the ______.
The proportion of variance measured by the regression model.
F-ratios in ANOVA can be used to assess the goodness of fit of the linear regression model. What does a high F-ratio indicate?
a good model, decreased prediction error
A simultaneous (standard) multiple regression approach is used when ____
no a priori model is assumed and all predictor variables are fit together
A stepwise approach is not a good approach because_____
it will always overfit the data
If a relationship is already known but we want to account for it. What multiple regression approach is taken?
Cook’s distance measures the extremity of an _____; values greater than 1 are cause for concern.
Define scedasticity.
scedasticity refers to the distribution of the residual error relative to the predictor variable
Multiple linear regression assumes homo_______
homoscedasticity- residuals stay relatively constant over the range of the predictor variable
________ refers to a high similarity between two or more variables.
Singularity refers to a redundant variable. This typically occurs when one variable….
is a combination of two or more other variables. e.g intelligence scales
What are we trying to detect if we look for high multivariate correlations?
What are we trying to detect if we look for high bivariate correlations?
Small range of the predictor variable restricts _________
statistical power