correlation and regression Flashcards

1
Q

what is the point of tests of relationship

A

see if there is an association between variables and if there is a cause and effect relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

define correlation

A

change in 2 variables in the same direction at the same time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

define regression

A

investigate the causal effect of one variable on another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does regression assume

A

that one variable depends on the other (you can predict y from X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

relationship between X and Y for correlation

A

doesn’t matter which is y or x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

regression or correlation?

you can predict y from x

A

regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

plot when you are looking at the relationship between 2 continuous variable

A

scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what does a scatter plot show you

A

the relationship between two continuous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

you have two cntuniuous variables. what plot do you use to see their relationship

A

scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what’s it called when you are looking at whether two set of observations are associated

A

corrrelation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what’s it called when you are looking at how strong an association is

A

correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does correlation tell you

A

whether observations are correlated and how strong or significant the association is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what t eat do you run to see if two observations are correlated

A

Pearson’s product-moment correlation, spearmint rank-order correlation or Kendall rank order correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the assumptions for Pearson’s correlation

A
  1. both variables are continuous
  2. both are normally distributed (bivariate normal distribution)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

when would you use spearmans rank order correlation

A

if your variables are not normally distributed and you can’t run a Pearson’s correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

NONPARAMETRIC EQUIVALENT TO PEARSONS PRODUCT MOMENT CORRELATION

A

spearmans rank order correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what tests if there is a linear relationship between variable

A

Pearson correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what do you use Pearson correlation coefficient for

A

to see if there is a linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

explain what a value of r>0 indicates for Pearsons correlation coefficient

A

positive linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

explain what a value of r<0 indicated for Pearsons correlation coefficient

A

negative linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

explain what a v ally of r=0 indicates for Pearsons correlation coefficient

A

no linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what does linear correlation tell you

A

indicates whether variables are related (p<0.05), and how strong that relationship is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what influences p-values for Pearsons correlation

A
  • sample size
  • large N can give low P, even when effect (r) is weak
  • high r can have non-significant p-values if N is low
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

how does r value impact results from Pearsons correlation

A

high r can have non-significant p-values if N is low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

how does sample size influence Pearsons correlation results

A

high N can give low P, even when effect (r) is weak
high r can have non-significant P-valuyes if N is low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what test to use if you want to compare ranked variables

A

spearmans rank order correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

which test is a more conservative approach for correlation

A

spearmans correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

what are the assumptions for regression

A
  1. there is a causal relationship
  2. you can predict Y (effect, response) from X (cause, predictor, covariate)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

what are the assumptions for linear regression

A
  1. assumes you can express the relationship between Y and X as a linear equation
  2. y is distributed normally at each value of x
  3. the variance is equal (homeneity)
  4. errors are independent (no serial correlation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

in the regression equation (y=mx+B), which is dependent variable and which is independent

A

y is dependent
x is independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

difference between parameters vs variables

A

variables vary, parameters are constant9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

in the y=mx+b equation, which are variables and which are parameters

A

x and y are variables
B and M are parameters

33
Q

goal for looking at regression equation

A

to find values of B and M (parameters) that provide the best fit to the data

34
Q

how to calculate residuals

A

actual-predicted values

35
Q

how is the best fit line chosen

A

the sum of the squared distances of the point for the line is minimized

36
Q

variance=?

A

mean squared residual

37
Q

mean squared residual= ?

A

variance

38
Q

what is R2

A

coefficient of determination

39
Q

define coefficient of determination (r2)

A

proportion of the variance in the observed values of the dependent variable that is explained by the regression model

40
Q

what null hypothesis is being tested with linear regression

A

there I no linear relationship between X and Y

41
Q

regression coefficient=

A

slope

42
Q

what is slope called in regression equation

A

regression coeffricnet

43
Q

how do. you test the assumptions of linear regression

A
  1. examine linearity assumption
  2. examine for constant variance for all levels (homoscedasticity)
  3. evaluate normal distribution
  4. evaluate independence assumption
44
Q

how to do residual analysis

A

the residual for observation is the difference between its observed and predicted value

45
Q

how to do graphical analysis of residuals `

A

plot the residuals :
1. residuals vs independent
2. residuals vs predicted
3. residual vs order of the data
4. residual lag plot
5. histogram of the residuals

46
Q

how to see independence of errors

A

durban watsopn statistics

47
Q

what do you use Durban Watson statistic for

A

to see independence of errors

48
Q

how does Durban Watson statistic tell you how to see independence of errors

A

if D=0, positive correlation
if D=2, no correlation
if D=4, negative correlation

49
Q

Durban Watson: if D=0, what is correlation

A

positive correlation

50
Q

Durban watson: if D=2 what is correlation

A

no correlation

51
Q

Durban Watson: if D=4, what is correlation

A

negative correlation

52
Q

what is quantile-quantile (Q-Q) plot used for

A

its a technique for determining if two data sets come from populations with a common distribution

53
Q

what does Q-Q plot do

A

plots the quantiles of the first data set against the quantiles of the second dataset

54
Q

how to see normal distribution with quantiles

A

plot the theoretical quantiles on horizontal axis and the sample quantiles on vertical axis

55
Q

uniform distribution for Q-Q plot has what shape

A

S shape

56
Q

can you compare models based on R2

A

r2 always increases when additional predictors are added to the model
- you can compare adjusted r2 over models with different numbers of parameters

57
Q

what is adjusted r2

A
  • increases when a new predictor is included only if the new prerdictor improves r2 more than would be expected by chance
  • comparable over models with different number of parameters
58
Q

what do you use to compare different models with different numbers of parameters

A

adjusted r2

59
Q

adjusted r2 will alway be ___ to R2

A

adjusted r2 will always be < r2

60
Q

which r2 value do you use to report

A

if you are comparing models, use adjusted r2
otherwise, report r2

61
Q

what measured how much the fitted values in the model change when the nth datapoint is deleted

A

cooks distance

62
Q

explain results from cooks D

A
  • large D indicates that the data point strongly influences the fitted values
  • if D>0.05 then that data point is worthy of further investigation as it may be influential
  • if D>1, then that data point is quite likely to be influential
63
Q

if cooks D is large, what does that mean

A

the data point strongly influences the fitted values

64
Q

if cooks d is >0.05, what does that mean

A

the data point is worthy of further investigation

65
Q

if cooks D > 1, what does that mean

A

the data point is likely to be influential

66
Q

what do you use if there are several factors affecting the dependent variable

A

multiple regression

67
Q

what is multiple regression

A

estimates the relationship between variables, taking into account additional variables

68
Q

function of multiple regression?

A
  1. controls for confounders
  2. tests for interactions between predictors
  3. improves predictions
69
Q

how to know which predictor (coefficient) has the strongest effect in multiple regression

A

compare the coefficients in the linear regression coefficient. the higher, the stronger its effect

70
Q

what if predictors are in different units? how d we compare them to see which has strongest effect

A

standardize the variables

71
Q

does standardizing variables change p?

A

if you have an interaction term, standardizing variables can make the p values change for main effects but not for interaction effects

72
Q

what is multicollinearity

A

an independent variable is highly correlated with another independent variable in a multiple regression equation

73
Q

impacts of multicollinearity

A
  1. undermines the statistical significance of an independent variable
  2. reduces the precision of the estimated coefficients, which weakens the statistical power of the regression model
74
Q

what is collinear

A

when two predator variables express a linear relationship

75
Q

if r>0.6 for correlations in multiple regression, what do you do

A

exclude them from the model

76
Q

what is variance inflation factor

A

esimates how much the variance of a coefficient is inflated because of linear dependence with other predictors

77
Q

explain the difference between interactions and multicolinearity

A

one is about the joint effect of the X variables on Y
one is about relationships between X variables (ignoring Y)

78
Q
A