Correlation and MR Theory Flashcards
What are the three types of multiple regression
Simultaenous
Stepwise
Hierachical
What are correlations and regressions used for
The study of the relationship between two or more variables and how they relate to each other
When do yo do a regression opposed to a correlation
When there is more than one fixed variable
What does regression allow
The prediction of Y on the basis of knowledge of X
What does correlation allow
Measure of strength of relationship between X and Y
How are correlations and multiple regressions plotted
Scatter plots
What is a scatterplot
A 2-d diagram 1 point for each participant where coordinates are the scores
What are correlation and scatter plots liked to
The degree to which the points cluster around the regression line
What value does the regression and correlation lie between
-1 and +1
What measure of association do you use when V1 and V2 are interval or ratio
Person’s
What measure of association do you use with ordinal rank variables of ranking
Spearman’s rho
What measure of association do you use when ordinal rank variables with opinions
Kendalls Tau
What measure of association do you use with true dichotomy variables
Phi
What measure of association do you use with a true dichotomy and interval variables
Point-Biseral
How can a Venn diagram be used to represent a correlation
The size of the circles represents the variance of variable overlapping circles denote corrected variables
What does a Venn diagram represent
Size of circles represents the variance of variable overlapping circles denote correlated variables
What does a partial correlation look at
Measures the strength of dependence between 2 variables that is not accounted for by the a in which they both change in response to variations in a selected subset of other variables
What is a multiple regression for
To learn about the relationship between several independent variables (predictors) and one dependent variable (criterion)
What can multiple regressions create
Predictive tools
What are examples of predictive tools
Estate agents analysing selling price for each house
Psychology studies on depression
What is the general equation for multiple regressions
Ŷ = b0 + b1X1 + b2X2 + …. + bpXp
What does the equation for linear regression involve
The linear equation finds the point on the Y axis it shifts and B gives the best predictive outcome of Y
What is the objective of a multiple regression
Find the best fit with data
Minimize overall prediction error, hopefully to one predictor
In the multiple regression equation what does the Y hat repreresent
predictive model of variable, tells what is the difference between the predictive score and what the actual score is. SS means sum of squares, subtract from predictive value then square it. Squaring gives more weight if near the line
What does the MR equation give
The equation of a line b1 = slope
b0 = intercept
What does the equation assess
The goodness of fit based on similar things, and based on how far is the point from what is expected
In a multiple correlation coefficient if the correlation poor
Then poor model predictability
What is the symbol for the coefficient of determination
R2
What is R2
Proportion of variability in data set accounted for by statistical model
What is the square of multiple correlation coefficient
F-ratio
What does the f ratio give in a multiple correlation coefficient
Improvement in prediction of criterion compared to inaccuracy of the model
What is the equation for assessing the goodness of the fit
R2 = SSM / SST or
F = MSM / MSR
What does the equation for assessing the goodness of the fit involve
Taking SS and dividing by DF
What does the equation of assessing the goodness of the fit punish for
Too many variables
What should be used instead of SS
MS
What is the equation for MsM =
MSM = SSM / dfM MSR = SSR / dfR
What is the equation for number of variables in model
dfM
What is the number of observations equation
dfR (number of parameters being estimated)
What does a good model cause
A large improvement in prediction due to the model (Large MSm)
What does a small difference between model and data cause
A large F ratio, which can be used to measure how good the fit is in the data
What are the three ways to add terms to a model
Either put all the variables in, but more variables less generalised result
What are the 3 types of regression
Simultaneous
Stepwise
Hierarchical
What type of model does the simultaneous model have
No a priori model
How are all the IVs entered in a simultaneous regression
All IVs entered at once
What is a negative of the simultaneous regression
Over fitting the data
What type of model is a stepwise model
No a priori model
How are the IVs entered in a stepwise model
Computer choses, ons tats group, a posteriori model
What is a negative of a stepwise regression
Capitalises on chance effect
What is a strength of the hierarchical regression
Theoretically sound
How is the data entered in a hierarchical regression
A priori
driven by theory
What factors affect correlation
Outliers and inferential points Homo/heteroscedasciity Singularity and multi collinearity Number of cases vs number of predictors Range Distribution
What does homo/hetero-scedasticity- refer to
How well scattered the plots are
What does singularity and multi colineraity refer to
Independence of the variables, cant correlate .09
Define an outlier
Points which deviate markedly from other in sample
How is an outlier assessed
By cooks distance
Define homoscedasticity
Variability of scores (errors) in one continuous variable same in second variable – variable 1 to be similar to variable 2, uniform scatter across the line
- uniform scatter
Define heteroscedasticity
One variable is skewed or the relationship is non-linear, tight variance then grows as X grows, X well distributed, and the more drastic this is the harder your model, your variance is homocentric.
Define singularity
Independent variables that don’t relate in ant way
Define multi collineraity
Variables that are highly correlated >0.90
What is the problem of singularity
Dont want to measure the same thing twice, stats it prevents metric inversion
What are the solutions for singularity
High bivariate correlations >0.90 compute correlations and remove when appropriate
What is the solution for high multivariate correlations
Examine SMC (squared multiple correlation of each IV)
What is the tolerance level for high mutlivarate correlations
1 - SMC
Why is the number of cases important
N/M (number of cases / number of predictions) being too small will cause meaningless results
What is a medium effect size for multiple correlations
N > 50 + 8*m
m = number of predictors
What is a medium effect size for a simple linear regression
N > 104 + m – want N at least 100, more scores greater issue fitting them in a meaningful way
What is a medium effect size for a stepwise regression
N> 40* m
m = number of prediction
How can a small effect size or skewed DV or measurement error be measured
N > (8/f2) + (m – 1)
What figures indicates a small effect size with Cohens f2
0.02
What figure indicates a medium effect size with Cohens f2
0.15
What figure indicates a large effect size with Cohens f2
0.35
What is another issue with regressions
Range
Why is the range an issue with regressions
A small range restricts power of tests, want a good spread
What is the quartet relating to the distribution of variables
Anscombe’s quartet
What is Anscombe’s quartet
Same mean, variance, correlation and regression lin
What does Anscombe’s quartet give
A strong effect without looking at the data
With 1+ IVs and aiming to predict IV from DV or relationship between what stats test do you use
Multiple regression
With 1+ IVs aiming to ask what is the relationship between 2 variables once effect of others removed what stats test do you use
Paritial correlation
When testing two sets of IV and trying to find what is common between the two what stats test do you use
Canonical correlation
How do you report a multiple regression
Table 3 displays the relationship between BART task performance and each predictor variable. Social risk taking (SOEPsoc): (b = .762) indicates that as BART score increases by one unit, social risk taking increases by .762. Social risking taking is the only significant predictor of BART task performance: t (492) = 4.345, p< .001. Indicating that social risk taking is the best predictor, out of the seven subscales, of BART task performance.