Correlations Flashcards

1
Q

What is the best way to look at residuals?

A

PP plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linear function

A

Same variable but diffrent units of measurement —> slopes become arbitrary

Would have complete shared variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the differences between partial and semi-partial correlations?

A

Semi-partial - used to examine the additional predictive value of a predictor, residualises one variable

Partial - used to statistically control other predictors, residualises both variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When do you use multiple regression?

A

If you want to predict a response variable using many predictors

Can also determine if we have more than one predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When do you use semi-partial correlation?

A

If you want to determine how much benefit a predictor gives you on top of several other predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When do you use partial correlation?

A

If you want to examine the strength of a relationship between variables while holding other variables constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What would you expect if you correlate a z score and percentile of a variable?

A

Not complete correlation but very close

Perfect Kendall’s Tau

Very highly correlated = collinearity (not linear function though)

Radically changes p value, standard error etc - cannot identify a unique effect of z score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is multi-collinearity?

A

None of the predictors are correlated with the variable but are highly correlated with each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is VIF?

A

Variable inflation factor - collinearity diagnostic
Increases with correlation
>9 is considered problematic (3 when square root)
Regression ignoring dv

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In what circumstances can’t you have a linear relationship?

A

Between a predictor and a discrete DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is correlation?

A

It’s all about prediction - if there is a relationship between two variables we can use x to estimate y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can we characterise a relationship?

A

Strength - how well one variable can predict another

Form - what is the shape of the variable

Direction (if form is monotone) - is the direction positive or negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the criteria for strength?

A

There is none - it’s a subjective idea

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Kendall’s Tau?

A

A non parametric correlation test

Used when data set is small with large number of tied ranks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is Kendall’s Tau useful?

A

Can draw more accurate generalisations with Kendall’s Tau than Spearman’s

Helps us understand strength and direction of monotone relationships

Resistant to outliers

Tb used to solve the problem of tied ranks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How are movements between points characterised?

A

Consistent - as you go up in x you go up in y (positive)

Inconsistent - as you go up in x you go down in y (negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you calculate Tau?

A
  1. Calculate the proportion of consistent movements (con/total)
  2. T = (2 X proportion of consistent movements) - 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What makes Kendall’s Tau non-parametric?

A

Slope and intercept aren’t needed so it doesn’t assume parametric from for the relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is standardisation?

A

Convert into standard set of units (SDs) to overcome dependence on measurement scale problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Pearson’s correlation

A

Coefficient = r —> ranges between -1 and 1 (0 = no relationship)

For linear relationships only

Highly sensitive to outliers

Strong when big x standardised scores are paired with big y standardised scores

Positive when positive x standardised scores are paired with positive y standardised scores and vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a z-score?

A

SD score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How do you compare independent correlations?

A

Transform the r’s into z values using Fishers z transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the first step in regression?

A

Units must be unstandardised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How is the regression slope defined?

A

b1 = (SDy/SDx) X r

25
Q

How do you find the intercept for regression?

A

b0 = mean of y - ( b1 X mean of x)

26
Q

What is the linear regression model?

A

yi = b0 + b1(xi) + error

27
Q

What are the assumptions of linear regression?

A

The true relationship is linear, has intercept b0, slope b1 and is contaminated by error

All errors are independent- cant assume when a variable is correlated at different levels e.g time

28
Q

What are residuals?

A

Prediction errors

The regression line minimised the sums of squares residuals

Diagnose problems with assumptions

29
Q

What are good residuals?

A

Don’t show systematic trends

Equally variable

Normally distributed

Don’t have outliers

30
Q

What is R^2?

A

The coefficient of determination

Measures the amount of variability in one variable that is shared by another variable

Tells us how close points are to the line - error

Compare how well regression line can predict y compared to mean of y

31
Q

How do you calculate R^2?

A

R^2 = 1 - (SSregression / SS mean of y)

32
Q

What is adjusted R^2?

A

Prevents overfitting

Minimising the sum of squared residuals will give you the best possible line

Even true regression line won’t do better

Biased —> cant be 0

Use SPSS

33
Q

What is the formula for multiple regression?

A

Yi = B0 + B1xi + B2xi + B3xi + Error

34
Q

What is collinearity?

A

A limitation of multiple regression

Occurs when predictors are highly correlated - contain essentially the same information

No unique contribution or relationship can be determined

35
Q

What are the symptoms of collinearity?

A

Strong predictors are nonetheless non-significant

Large standard errors

Coefficients change radically when new predictors are added

High VIF

R^2 for many predictors is basically sam for each separately

36
Q

How do we control for variables?

A

We hold it constant by residualising it

37
Q

What is an interaction

A

The effect of one IV differs as a function of another IV - (Geoff)

Product of two predictors and the slope of one variable depends on another (Main one from stats)

38
Q

What is centering?

A

Subtracting the mean of x from all other x values to create an intercept of y when x = mean of x

This increases meaningfulness of intercept and reduces collinearity with interactions and polynomial regressions

39
Q

How do you centre an intercept? (Formula)

A

Yi = B0 + B1(Xi - mean of X) + error

40
Q

What changes after centering?

A

The intercept (SE and Significance)

The slope does NOT change in SE or significance

41
Q

What is Dummy Coding?

What is the formula?

A

Used when you want to include a discrete predictor e.g gender (g)

You would code one 1 and the other 0 which alters the equation when substituted in — > for 1 B2 remains and 0 it disappears

Yi = B0 + B1xi + B2gi + error

42
Q

What is dummy coding measuring?

A

As B0 and B1 sum together to make the intercept dummy coding is looking to see how much B2 changes the intercept

43
Q

How do we create an interaction using dummy coding?

A

Add a new variable that is the product of the dummy code variable (gender) and x called B3gix

44
Q

What are the types of interaction?

A

Continuous/continuous —> rare (polynomial)
Discrete/continuous
Discrete/discrete —> ANOVA

45
Q

How do you centre when using dummy codes?

A

Contrast codes - in this case it would be -0.5

46
Q

How do you centre continuous predictors?

A

Using mean

47
Q

What is polynomial regression?

A

Linear regression that does not have a linear relationship

Contains squared terms - which is like an interaction between x and itself - causes slope to change

Causes curves in form

X must be centred

48
Q

What is a linear combination?

A

Sum of terms multiplied by constants (doesn’t mean its linear)

49
Q

What is the form of a polynomial?

A

b0 + b1x + b2x^2 + b3x^3… bnx^n

50
Q

What is the letter n in a polynomial?

A

The order or degree

Highest power

51
Q

What does increasing parameters cause?

A

An increase in flexibility which is not always a good thing

52
Q

What is the benefit of centering x in a polynomial?

A

Prevents collinearity between x and x squared

Allows it to fit the curve better and identify unique components causing curvature

53
Q

How do you deal with overfitting?

A

Adjust R^2

Replication

Cross validation - split data into parts and fit curve to one part (fit data) then test in other part (hold out data)

54
Q

What are the assumptions of linear regression analysis?

A

DV is interval scaled

DV is a linear combination of predictors

Observations/errors are independent

Heteroscedasticity

Errors are normally distributed

55
Q

What is the regression technique for a binary DV?

A

Logistic regression

56
Q

What is the regression technique for counts with upper limits?

A

Logistic regression

57
Q

What is the regression technique for counts without upper limits?

A

Regression using rates

58
Q

What is the regression technique used for time to event DVs?

A

Regression using rates

59
Q

What is the regression technique used for ordinal DV?

A

Ordinal regression