correlation and regression Flashcards

1
Q

what is the point of tests of relationship

A

see if there is an association between variables and if there is a cause and effect relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

define correlation

A

change in 2 variables in the same direction at the same time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

define regression

A

investigate the causal effect of one variable on another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does regression assume

A

that one variable depends on the other (you can predict y from X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

relationship between X and Y for correlation

A

doesn’t matter which is y or x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

regression or correlation?

you can predict y from x

A

regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

plot when you are looking at the relationship between 2 continuous variable

A

scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what does a scatter plot show you

A

the relationship between two continuous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

you have two cntuniuous variables. what plot do you use to see their relationship

A

scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what’s it called when you are looking at whether two set of observations are associated

A

corrrelation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what’s it called when you are looking at how strong an association is

A

correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does correlation tell you

A

whether observations are correlated and how strong or significant the association is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what t eat do you run to see if two observations are correlated

A

Pearson’s product-moment correlation, spearmint rank-order correlation or Kendall rank order correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the assumptions for Pearson’s correlation

A
  1. both variables are continuous
  2. both are normally distributed (bivariate normal distribution)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

when would you use spearmans rank order correlation

A

if your variables are not normally distributed and you can’t run a Pearson’s correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

NONPARAMETRIC EQUIVALENT TO PEARSONS PRODUCT MOMENT CORRELATION

A

spearmans rank order correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what tests if there is a linear relationship between variable

A

Pearson correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what do you use Pearson correlation coefficient for

A

to see if there is a linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

explain what a value of r>0 indicates for Pearsons correlation coefficient

A

positive linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

explain what a value of r<0 indicated for Pearsons correlation coefficient

A

negative linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

explain what a v ally of r=0 indicates for Pearsons correlation coefficient

A

no linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what does linear correlation tell you

A

indicates whether variables are related (p<0.05), and how strong that relationship is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what influences p-values for Pearsons correlation

A
  • sample size
  • large N can give low P, even when effect (r) is weak
  • high r can have non-significant p-values if N is low
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

how does r value impact results from Pearsons correlation

A

high r can have non-significant p-values if N is low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
how does sample size influence Pearsons correlation results
high N can give low P, even when effect (r) is weak high r can have non-significant P-valuyes if N is low
26
what test to use if you want to compare ranked variables
spearmans rank order correlation
27
which test is a more conservative approach for correlation
spearmans correlation
28
what are the assumptions for regression
1. there is a causal relationship 2. you can predict Y (effect, response) from X (cause, predictor, covariate)
29
what are the assumptions for linear regression
1. assumes you can express the relationship between Y and X as a linear equation 2. y is distributed normally at each value of x 3. the variance is equal (homeneity) 4. errors are independent (no serial correlation
30
in the regression equation (y=mx+B), which is dependent variable and which is independent
y is dependent x is independent
31
difference between parameters vs variables
variables vary, parameters are constant9
32
in the y=mx+b equation, which are variables and which are parameters
x and y are variables B and M are parameters
33
goal for looking at regression equation
to find values of B and M (parameters) that provide the best fit to the data
34
how to calculate residuals
actual-predicted values
35
how is the best fit line chosen
the sum of the squared distances of the point for the line is minimized
36
variance=?
mean squared residual
37
mean squared residual= ?
variance
38
what is R2
coefficient of determination
39
define coefficient of determination (r2)
proportion of the variance in the observed values of the dependent variable that is explained by the regression model
40
what null hypothesis is being tested with linear regression
there I no linear relationship between X and Y
41
regression coefficient=
slope
42
what is slope called in regression equation
regression coeffricnet
43
how do. you test the assumptions of linear regression
1. examine linearity assumption 2. examine for constant variance for all levels (homoscedasticity) 3. evaluate normal distribution 4. evaluate independence assumption
44
how to do residual analysis
the residual for observation is the difference between its observed and predicted value
45
how to do graphical analysis of residuals `
plot the residuals : 1. residuals vs independent 2. residuals vs predicted 3. residual vs order of the data 4. residual lag plot 5. histogram of the residuals
46
how to see independence of errors
durban watsopn statistics
47
what do you use Durban Watson statistic for
to see independence of errors
48
how does Durban Watson statistic tell you how to see independence of errors
if D=0, positive correlation if D=2, no correlation if D=4, negative correlation
49
Durban Watson: if D=0, what is correlation
positive correlation
50
Durban watson: if D=2 what is correlation
no correlation
51
Durban Watson: if D=4, what is correlation
negative correlation
52
what is quantile-quantile (Q-Q) plot used for
its a technique for determining if two data sets come from populations with a common distribution
53
what does Q-Q plot do
plots the quantiles of the first data set against the quantiles of the second dataset
54
how to see normal distribution with quantiles
plot the theoretical quantiles on horizontal axis and the sample quantiles on vertical axis
55
uniform distribution for Q-Q plot has what shape
S shape
56
can you compare models based on R2
r2 always increases when additional predictors are added to the model - you can compare adjusted r2 over models with different numbers of parameters
57
what is adjusted r2
- increases when a new predictor is included only if the new prerdictor improves r2 more than would be expected by chance - comparable over models with different number of parameters
58
what do you use to compare different models with different numbers of parameters
adjusted r2
59
adjusted r2 will alway be ___ to R2
adjusted r2 will always be < r2
60
which r2 value do you use to report
if you are comparing models, use adjusted r2 otherwise, report r2
61
what measured how much the fitted values in the model change when the nth datapoint is deleted
cooks distance
62
explain results from cooks D
- large D indicates that the data point strongly influences the fitted values - if D>0.05 then that data point is worthy of further investigation as it may be influential - if D>1, then that data point is quite likely to be influential
63
if cooks D is large, what does that mean
the data point strongly influences the fitted values
64
if cooks d is >0.05, what does that mean
the data point is worthy of further investigation
65
if cooks D > 1, what does that mean
the data point is likely to be influential
66
what do you use if there are several factors affecting the dependent variable
multiple regression
67
what is multiple regression
estimates the relationship between variables, taking into account additional variables
68
function of multiple regression?
1. controls for confounders 2. tests for interactions between predictors 3. improves predictions
69
how to know which predictor (coefficient) has the strongest effect in multiple regression
compare the coefficients in the linear regression coefficient. the higher, the stronger its effect
70
what if predictors are in different units? how d we compare them to see which has strongest effect
standardize the variables
71
does standardizing variables change p?
if you have an interaction term, standardizing variables can make the p values change for main effects but not for interaction effects
72
what is multicollinearity
an independent variable is highly correlated with another independent variable in a multiple regression equation
73
impacts of multicollinearity
1. undermines the statistical significance of an independent variable 2. reduces the precision of the estimated coefficients, which weakens the statistical power of the regression model
74
what is collinear
when two predator variables express a linear relationship
75
if r>0.6 for correlations in multiple regression, what do you do
exclude them from the model
76
what is variance inflation factor
esimates how much the variance of a coefficient is inflated because of linear dependence with other predictors
77
explain the difference between interactions and multicolinearity
one is about the joint effect of the X variables on Y one is about relationships between X variables (ignoring Y)
78