Correlation and MR Theory Flashcards

1
Q

What are the three types of multiple regression

A

Simultaenous
Stepwise
Hierachical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are correlations and regressions used for

A

The study of the relationship between two or more variables and how they relate to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When do yo do a regression opposed to a correlation

A

When there is more than one fixed variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does regression allow

A

The prediction of Y on the basis of knowledge of X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does correlation allow

A

Measure of strength of relationship between X and Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How are correlations and multiple regressions plotted

A

Scatter plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a scatterplot

A

A 2-d diagram 1 point for each participant where coordinates are the scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are correlation and scatter plots liked to

A

The degree to which the points cluster around the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What value does the regression and correlation lie between

A

-1 and +1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What measure of association do you use when V1 and V2 are interval or ratio

A

Person’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What measure of association do you use with ordinal rank variables of ranking

A

Spearman’s rho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What measure of association do you use when ordinal rank variables with opinions

A

Kendalls Tau

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What measure of association do you use with true dichotomy variables

A

Phi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What measure of association do you use with a true dichotomy and interval variables

A

Point-Biseral

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can a Venn diagram be used to represent a correlation

A

The size of the circles represents the variance of variable overlapping circles denote corrected variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a Venn diagram represent

A

Size of circles represents the variance of variable overlapping circles denote correlated variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does a partial correlation look at

A

Measures the strength of dependence between 2 variables that is not accounted for by the a in which they both change in response to variations in a selected subset of other variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a multiple regression for

A

To learn about the relationship between several independent variables (predictors) and one dependent variable (criterion)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What can multiple regressions create

A

Predictive tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are examples of predictive tools

A

Estate agents analysing selling price for each house

Psychology studies on depression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the general equation for multiple regressions

A

Ŷ = b0 + b1X1 + b2X2 + …. + bpXp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does the equation for linear regression involve

A

The linear equation finds the point on the Y axis it shifts and B gives the best predictive outcome of Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the objective of a multiple regression

A

Find the best fit with data

Minimize overall prediction error, hopefully to one predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In the multiple regression equation what does the Y hat repreresent

A

predictive model of variable, tells what is the difference between the predictive score and what the actual score is. SS means sum of squares, subtract from predictive value then square it. Squaring gives more weight if near the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does the MR equation give

A

The equation of a line b1 = slope

b0 = intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What does the equation assess

A

The goodness of fit based on similar things, and based on how far is the point from what is expected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

In a multiple correlation coefficient if the correlation poor

A

Then poor model predictability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the symbol for the coefficient of determination

A

R2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is R2

A

Proportion of variability in data set accounted for by statistical model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is the square of multiple correlation coefficient

A

F-ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What does the f ratio give in a multiple correlation coefficient

A

Improvement in prediction of criterion compared to inaccuracy of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is the equation for assessing the goodness of the fit

A

R2 = SSM / SST or

F = MSM / MSR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What does the equation for assessing the goodness of the fit involve

A

Taking SS and dividing by DF

34
Q

What does the equation of assessing the goodness of the fit punish for

A

Too many variables

35
Q

What should be used instead of SS

A

MS

36
Q

What is the equation for MsM =

A

MSM = SSM / dfM MSR = SSR / dfR

37
Q

What is the equation for number of variables in model

A

dfM

38
Q

What is the number of observations equation

A

dfR (number of parameters being estimated)

39
Q

What does a good model cause

A

A large improvement in prediction due to the model (Large MSm)

40
Q

What does a small difference between model and data cause

A

A large F ratio, which can be used to measure how good the fit is in the data

41
Q

What are the three ways to add terms to a model

A

Either put all the variables in, but more variables less generalised result

42
Q

What are the 3 types of regression

A

Simultaneous
Stepwise
Hierarchical

43
Q

What type of model does the simultaneous model have

A

No a priori model

44
Q

How are all the IVs entered in a simultaneous regression

A

All IVs entered at once

45
Q

What is a negative of the simultaneous regression

A

Over fitting the data

46
Q

What type of model is a stepwise model

A

No a priori model

47
Q

How are the IVs entered in a stepwise model

A

Computer choses, ons tats group, a posteriori model

48
Q

What is a negative of a stepwise regression

A

Capitalises on chance effect

49
Q

What is a strength of the hierarchical regression

A

Theoretically sound

50
Q

How is the data entered in a hierarchical regression

A

A priori

driven by theory

51
Q

What factors affect correlation

A
Outliers and inferential points
Homo/heteroscedasciity
Singularity and multi collinearity
Number of cases vs number of predictors
Range
Distribution
52
Q

What does homo/hetero-scedasticity- refer to

A

How well scattered the plots are

53
Q

What does singularity and multi colineraity refer to

A

Independence of the variables, cant correlate .09

54
Q

Define an outlier

A

Points which deviate markedly from other in sample

55
Q

How is an outlier assessed

A

By cooks distance

56
Q

Define homoscedasticity

A

Variability of scores (errors) in one continuous variable same in second variable – variable 1 to be similar to variable 2, uniform scatter across the line
- uniform scatter

57
Q

Define heteroscedasticity

A

One variable is skewed or the relationship is non-linear, tight variance then grows as X grows, X well distributed, and the more drastic this is the harder your model, your variance is homocentric.

58
Q

Define singularity

A

Independent variables that don’t relate in ant way

59
Q

Define multi collineraity

A

Variables that are highly correlated >0.90

60
Q

What is the problem of singularity

A

Dont want to measure the same thing twice, stats it prevents metric inversion

61
Q

What are the solutions for singularity

A

High bivariate correlations >0.90 compute correlations and remove when appropriate

62
Q

What is the solution for high multivariate correlations

A

Examine SMC (squared multiple correlation of each IV)

63
Q

What is the tolerance level for high mutlivarate correlations

A

1 - SMC

64
Q

Why is the number of cases important

A

N/M (number of cases / number of predictions) being too small will cause meaningless results

65
Q

What is a medium effect size for multiple correlations

A

N > 50 + 8*m

m = number of predictors

66
Q

What is a medium effect size for a simple linear regression

A

N > 104 + m – want N at least 100, more scores greater issue fitting them in a meaningful way

67
Q

What is a medium effect size for a stepwise regression

A

N> 40* m

m = number of prediction

68
Q

How can a small effect size or skewed DV or measurement error be measured

A

N > (8/f2) + (m – 1)

69
Q

What figures indicates a small effect size with Cohens f2

A

0.02

70
Q

What figure indicates a medium effect size with Cohens f2

A

0.15

71
Q

What figure indicates a large effect size with Cohens f2

A

0.35

72
Q

What is another issue with regressions

A

Range

73
Q

Why is the range an issue with regressions

A

A small range restricts power of tests, want a good spread

74
Q

What is the quartet relating to the distribution of variables

A

Anscombe’s quartet

75
Q

What is Anscombe’s quartet

A

Same mean, variance, correlation and regression lin

76
Q

What does Anscombe’s quartet give

A

A strong effect without looking at the data

77
Q

With 1+ IVs and aiming to predict IV from DV or relationship between what stats test do you use

A

Multiple regression

78
Q

With 1+ IVs aiming to ask what is the relationship between 2 variables once effect of others removed what stats test do you use

A

Paritial correlation

79
Q

When testing two sets of IV and trying to find what is common between the two what stats test do you use

A

Canonical correlation

80
Q

How do you report a multiple regression

A

Table 3 displays the relationship between BART task performance and each predictor variable. Social risk taking (SOEPsoc): (b = .762) indicates that as BART score increases by one unit, social risk taking increases by .762. Social risking taking is the only significant predictor of BART task performance: t (492) = 4.345, p< .001. Indicating that social risk taking is the best predictor, out of the seven subscales, of BART task performance.