Correlation and Regression Flashcards

1
Q

What is the maximum correlation coefficient that is possible?

A

1.0
the datapoints are on a perfect line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of Correlation?

A

evaluate the relationship between two variables (x) and (y): f.e. GFR and age

-GFR in the future can be predicted with an equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What can be described by the Correlation line?

A

-quantitatively describes the
-Strength (described by the correlation coefficient, how close to 1, how close are the data points to the line
-Direction: the positive or negative slope of the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the term for the correlation coefficient?

A

Pearson product-moment coefficient of
correlation (r)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which type of data are used to describe correlations?

A

-interval or ratio data
in other words: continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which type of data is used for Spearman rho (r2)
Different type of correlation coefficient

A

Ordinal data
-Ranked

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

General mantra

A

Correlation does not equal Causation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which conditions are required to establish Causality?

A

-Controlled conditions
-Randmizedion (RCT), Placebo
-9 Bradford Hill criteria (discussed early in the semester):

biologic credibility of the association, logical time
sequence (cause precedes outcome), a
dose-response relationship, and
consistency of findings across several studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interpretation of Correlations: R-value

A

-0.25 indicates little to no relationship
-0.25-0.5 indicates a fair degree of relationship

-0.5-0.75 indicates a moderate to good relationship
-0.75-1.0 is considered good to excellent relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Coefficient of Determination (r^2)

A

-indicates the percentage of the total variance in the Y scores, that can be explained by the X scores
-it is explanatory

f.e.: GFR on the y-axis and age on the X-axis -> r=0.9 so the r^2 is 0.81 -> 81% of the Y variable (GFR) can be explained by the changes of the X variable (age)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Linear VS Curvilinear

A

Coefficient r is a measure of linear relationships only

-curvilinear relationships, are not described accurately by the linear correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Interpretation of Correlation

A

-2 variables should not be interpreted solely based on the correlation coefficient r

-variables should be plotted and see whether a linear or curvilinear relationship exists and whether an r value is appropriate

-the assosiation of 0.4 is not twice as strong as 0.2
-the difference in association between 0.5 and 0.6 is not the same as in 0.8 and 0.9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Correlation Matrix

A

-analyzing several variables at one time

-presenting corellation coefficients for all pairs of variables

-each variable will be regressed with each one to see if any of the variables is related to one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Significance of Correlation Coefficient

A

-the observed correlation is one of an infinite number of possible correlations -> obtained from a random sample of the population
-> subject to sampling error, or BIAS
->need to be tested for statistical SIGNIFICANCE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What would be the Null Hypothesis when determining the Correlation Coefficient?

A

H0 states that r=0 -> no correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the statistical tests used to reject the H0?

A

-Pearson product-moment coefficient of
correlation

-Spearman rho (r2)

-> to determine the p-value
-if the p-value is less than 0.05, we reject the 0 and state that 2 variables are correlated, DOESN’T mean that there is a STRONG correlation

17
Q

Which factor affects the correlation coefficient?

A

-Sample Size

-CAUTION:
-with a large sample size statistical significance can be achieved even with a low r-value (0.2, p < 0.05)

18
Q

Can a correlation be extrapolated to values outside the frame it was tested for?

A

No, it should be limited to the range of values used to produce the correlations

19
Q

Independence of variables

A

For correlated variables to be considered valid they have to be independent of each other

-dont use related variables -> relatives (mom and children) -> they will give a high correlation value

20
Q

Regression

A

used to predict the dependent variable (Y) based on the independent variable (X)

-f.e.: (Y) = GFR (X) = Age
-GFR depends on the age

21
Q

What is linear Regression?

A

-Plotting variables -> creating the line between the data points

-provides an equation that helps to predict the (Y) variable
-Y = m*x +b; (X) = Age

22
Q

What is non-linear Regression used for?

A

-Curvilinear relationships
-f.e. psychomotor ability and age
-low in kids, peaks at age 30, and declines

23
Q

When to use a Multivariate Analysis

A

When more than one variable affects variable (Y)
-f.e. Effect of Age and salt intake on GFR

-Often used to develop predictive models of
risk: prediction of cardiac vascular risk based on BP, age, sex, lipids, etc.
-> Incorporate the effect of multiple variables on a single outcome

24
Q

Which type of Regression is used for continuous outcome variables in Multiple Linear Regression?

A

Multiple Linear Regression

25
Q

What is Logistical Regression used for?

A

-dichotomous outcomes (variables)

-YES or NO questions
-f.e. to predict the mortality of a patient admitted to the hospital -> Variables (Y): diagnosis, disease state, age, etc -> LIVE or DEATH

-another example: discharge from the hospital at day 3

26
Q

How is Logistical Regression different from Multivariate Analysis?

A

-Logistical Regression includes multiple variables but the OUTCOME is dichotomous

27
Q

QUESTION

A

How is the Multivariate Analysis different from the Correlation Matrix?