Correlation and MR Theory Flashcards

1
Q

What are the three types of multiple regression

A

Simultaenous
Stepwise
Hierachical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are correlations and regressions used for

A

The study of the relationship between two or more variables and how they relate to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When do yo do a regression opposed to a correlation

A

When there is more than one fixed variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does regression allow

A

The prediction of Y on the basis of knowledge of X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does correlation allow

A

Measure of strength of relationship between X and Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How are correlations and multiple regressions plotted

A

Scatter plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a scatterplot

A

A 2-d diagram 1 point for each participant where coordinates are the scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are correlation and scatter plots liked to

A

The degree to which the points cluster around the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What value does the regression and correlation lie between

A

-1 and +1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What measure of association do you use when V1 and V2 are interval or ratio

A

Person’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What measure of association do you use with ordinal rank variables of ranking

A

Spearman’s rho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What measure of association do you use when ordinal rank variables with opinions

A

Kendalls Tau

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What measure of association do you use with true dichotomy variables

A

Phi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What measure of association do you use with a true dichotomy and interval variables

A

Point-Biseral

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can a Venn diagram be used to represent a correlation

A

The size of the circles represents the variance of variable overlapping circles denote corrected variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a Venn diagram represent

A

Size of circles represents the variance of variable overlapping circles denote correlated variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does a partial correlation look at

A

Measures the strength of dependence between 2 variables that is not accounted for by the a in which they both change in response to variations in a selected subset of other variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a multiple regression for

A

To learn about the relationship between several independent variables (predictors) and one dependent variable (criterion)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What can multiple regressions create

A

Predictive tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are examples of predictive tools

A

Estate agents analysing selling price for each house

Psychology studies on depression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the general equation for multiple regressions

A

Ŷ = b0 + b1X1 + b2X2 + …. + bpXp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does the equation for linear regression involve

A

The linear equation finds the point on the Y axis it shifts and B gives the best predictive outcome of Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the objective of a multiple regression

A

Find the best fit with data

Minimize overall prediction error, hopefully to one predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In the multiple regression equation what does the Y hat repreresent

A

predictive model of variable, tells what is the difference between the predictive score and what the actual score is. SS means sum of squares, subtract from predictive value then square it. Squaring gives more weight if near the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What does the MR equation give
The equation of a line b1 = slope | b0 = intercept
26
What does the equation assess
The goodness of fit based on similar things, and based on how far is the point from what is expected
27
In a multiple correlation coefficient if the correlation poor
Then poor model predictability
28
What is the symbol for the coefficient of determination
R2
29
What is R2
Proportion of variability in data set accounted for by statistical model
30
What is the square of multiple correlation coefficient
F-ratio
31
What does the f ratio give in a multiple correlation coefficient
Improvement in prediction of criterion compared to inaccuracy of the model
32
What is the equation for assessing the goodness of the fit
R2 = SSM / SST or | F = MSM / MSR
33
What does the equation for assessing the goodness of the fit involve
Taking SS and dividing by DF
34
What does the equation of assessing the goodness of the fit punish for
Too many variables
35
What should be used instead of SS
MS
36
What is the equation for MsM =
MSM = SSM / dfM MSR = SSR / dfR
37
What is the equation for number of variables in model
dfM
38
What is the number of observations equation
dfR (number of parameters being estimated)
39
What does a good model cause
A large improvement in prediction due to the model (Large MSm)
40
What does a small difference between model and data cause
A large F ratio, which can be used to measure how good the fit is in the data
41
What are the three ways to add terms to a model
Either put all the variables in, but more variables less generalised result
42
What are the 3 types of regression
Simultaneous Stepwise Hierarchical
43
What type of model does the simultaneous model have
No a priori model
44
How are all the IVs entered in a simultaneous regression
All IVs entered at once
45
What is a negative of the simultaneous regression
Over fitting the data
46
What type of model is a stepwise model
No a priori model
47
How are the IVs entered in a stepwise model
Computer choses, ons tats group, a posteriori model
48
What is a negative of a stepwise regression
Capitalises on chance effect
49
What is a strength of the hierarchical regression
Theoretically sound
50
How is the data entered in a hierarchical regression
A priori | driven by theory
51
What factors affect correlation
``` Outliers and inferential points Homo/heteroscedasciity Singularity and multi collinearity Number of cases vs number of predictors Range Distribution ```
52
What does homo/hetero-scedasticity- refer to
How well scattered the plots are
53
What does singularity and multi colineraity refer to
Independence of the variables, cant correlate .09
54
Define an outlier
Points which deviate markedly from other in sample
55
How is an outlier assessed
By cooks distance
56
Define homoscedasticity
Variability of scores (errors) in one continuous variable same in second variable – variable 1 to be similar to variable 2, uniform scatter across the line - uniform scatter
57
Define heteroscedasticity
One variable is skewed or the relationship is non-linear, tight variance then grows as X grows, X well distributed, and the more drastic this is the harder your model, your variance is homocentric.
58
Define singularity
Independent variables that don't relate in ant way
59
Define multi collineraity
Variables that are highly correlated >0.90
60
What is the problem of singularity
Dont want to measure the same thing twice, stats it prevents metric inversion
61
What are the solutions for singularity
High bivariate correlations >0.90 compute correlations and remove when appropriate
62
What is the solution for high multivariate correlations
Examine SMC (squared multiple correlation of each IV)
63
What is the tolerance level for high mutlivarate correlations
1 - SMC
64
Why is the number of cases important
N/M (number of cases / number of predictions) being too small will cause meaningless results
65
What is a medium effect size for multiple correlations
N > 50 + 8*m | m = number of predictors
66
What is a medium effect size for a simple linear regression
N > 104 + m – want N at least 100, more scores greater issue fitting them in a meaningful way
67
What is a medium effect size for a stepwise regression
N> 40* m | m = number of prediction
68
How can a small effect size or skewed DV or measurement error be measured
N > (8/f2) + (m – 1)
69
What figures indicates a small effect size with Cohens f2
0.02
70
What figure indicates a medium effect size with Cohens f2
0.15
71
What figure indicates a large effect size with Cohens f2
0.35
72
What is another issue with regressions
Range
73
Why is the range an issue with regressions
A small range restricts power of tests, want a good spread
74
What is the quartet relating to the distribution of variables
Anscombe's quartet
75
What is Anscombe's quartet
Same mean, variance, correlation and regression lin
76
What does Anscombe's quartet give
A strong effect without looking at the data
77
With 1+ IVs and aiming to predict IV from DV or relationship between what stats test do you use
Multiple regression
78
With 1+ IVs aiming to ask what is the relationship between 2 variables once effect of others removed what stats test do you use
Paritial correlation
79
When testing two sets of IV and trying to find what is common between the two what stats test do you use
Canonical correlation
80
How do you report a multiple regression
Table 3 displays the relationship between BART task performance and each predictor variable. Social risk taking (SOEPsoc): (b = .762) indicates that as BART score increases by one unit, social risk taking increases by .762. Social risking taking is the only significant predictor of BART task performance: t (492) = 4.345, p< .001. Indicating that social risk taking is the best predictor, out of the seven subscales, of BART task performance.