Regression Flashcards
What is a correlation?
An association or dependency between two independently observed variables
What graph/plot can we use to visualise a correlation?
A Bar chart
B Bar Graph
C Q-Q Plot
D Scatterplot
D Scatterplot
Pearson correlation co-efficient scores give an r value ranging between -1 and 1,
A score of 0 indicates what?
What does a score of 1 indicate?
What does a score of -1 indicate?
What do positive and negative scores indicate?
0 = no relationship between the variables
1 = Variables are identical
Positive - variables positively correlated
-1 = variables are exactly inverse
negative means variables are negatively correlated
If variables are both interval/ratio, we use a ___________ coefficient test, giving us an ___ value
Pearson’s coefficient
r value
If both variables are ordinal. we use either a Spearman’s rank, giving us a ____ value, or a Kendall’s rank, giving us a ____ value
Spearman’s = Rho
Kendall’s = tau
If both variables are dichotomous (binary), we use a ____coefficient.
phi coefficient
If we have one dichotomous variable, and one interval/ratio variable, we use a _____-______ coefficient, giving us an ___ value.
point-biserial coefficient
rpb value
A partial correlation is used when we have more than __ variables, and we want to test the _________ of a pair, whilst __________ for another variable.
partial correlation = when more than 2 variables
want to test correlation/association of one pair whilst accounting for a third
Multiple linear regressions describe/examine what?
The relationship between one or more predictor variables and a criterion variable
True or false, virtually all statistical models we use (ANOVA, t test, correlations) are special cases of the regression model
True
The regression line has the equation:
y= ax + b
Where y is the ______, x is the _______, ax is the ____/______, and b is the _-_______
y is height, x is length across
ax = slope, b = y-intercept
A residual error is how far a ____ _____ is from the _____ __ ___.
how far data point lies from line of fit
SST = ___ + ____
SST = SSR+SSM
prediction error is the difference between the ______ value and the _____ value
Pe = difference between prediction value and actual value
The best fit of the ____ occurs by minimizing _____ ______
best fit of model occurs by minimising prediction error
The coefficient of determination value is represented as ________
R squared
The goodness of fit of a model can be assessed using what 3 measures?
Multiple correlation coefficient = R
Coefficient of determination = R squared
F-Ratio
Similar to ANOVA , F-rations of regressions compare the ________ variance to the ________ variance or ________ total variance. Higher F-Ratios represent a _____ model and an increased prediction of _______ value - ______ value.
comparing explained variance to residual variance or total variance
higher f ratios = better model
better prediction, of actual value - predicted value
In Regressions, we use _____ squares rather than ____ __ squares for F-Ratios.
F = ____/____
use mean squares rather than sum of squares
f = MSm/MSR
MSM = ____/____
MSR = ____/____
MSR = SSM/dfM
MSR = SSR/ dfR
dfM (degrees of freedom M) is the number of _______ _______
dfR (degrees of freed R) is the number of ____________ minus the number of ________
dfM = number of predictor variables
dfR = number of observations (p’s) - number of coefficients
Effect size for regressions use the value of ___________.
A small effect size is a value of ______
A medium effect size is a value of ______
A large effect size is a value of ______
cohens f squared
small effect size = 0.02
medium = 0.15
large = 0.35
Cohens f squared = ________/ (_______)
rsquared/ 1-rsquared
What are the 3 main types of regressions, how do they each work?
Simultaneous - no a priori - all variables fit together
Stepwise - no a priori - predictor variables are added/removed one at a time
Hierarchical - based on a priori knowledge - create sever models by adding/removing variables each step, compare models to see which explains the best
Outliers are points which _______ substantially from others. They can affect the _______ _____ of ______. ______ distance measures the ______ of an outlier, where a value over _ is concerning
outliers = deviatins from other data points
can affect the linear model of fit
cooks distance reveals how bad outliers are, value over 1 is concerning
Scedasticity refers to the distribution of the residual error.
What is homoscedasticity, how can it be seen?
What is heteroscedasticity. how can it be seen?
homoscedasticity = distribution of residuals remain constant over range of predictor, no discernable pattern.
heteroscedasticity = distribution of residuals vary systematically, forming a pattern
Multicollinearity refers to what? is it good or bad
multicollinearity refers to high similarity between two or more variables - do not want this
Singularity refers to a _______ variable. This is when one variable is a _________ of two or more variables or __-______.
singularity refers to redundant variable
when one variable = combo of two subscores/subscales
Multicollinearity can be detected using _________ correlations
singularity can be detected using ________ correlations.
Both can also be checked using ________ values
multicollinearity detected with bivariate correlations
singularity tested using multivariate correlations
both can be tested with tolerance values
Multicollinearity can be detected from ____ tolerance values
A Low
B High
C Varying
D Similar
A Low
The rule of thumb is that number of __________ (N) should be high compared to the number of ________ variables (m).
N should be high compared to number of predictors (M)
A small range of the predictor variable ______ statistical power.
restricts statistical power