L11 to L13: Correlation and Regression Flashcards

1
Q

Define Correlation

A

ASSUMING that relationship is linear, it QUANTIFIES degree to which 2 random variables are related

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is correlation coefficient (R)?

A

QUANTITATIVE measure of the STRENGTH and DIRECTION of a linear relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Two types of correlation analysis, and when to use them?

A
  1. Pearson product-moment correlation (PPMC): Parametric test used when variables are continuous
  2. Spearman rank Correlation (SRC): NPT, when ≥1 are non-normal distribution, OR ordinal data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Assumptions of correlation analyses

A
  1. x and y independent
  2. Pairs of observations (x,y) are randomly selected
  3. PPMC: underlying ppn of both variables are normally variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hyp of correlation analysis

A

H0: r = 0 (no correlation)
H1: r ≠0 (have correlation) or r>0 or r<0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Advantage of SRC over PPMC

A

Decreased sensitivty to outliers since ranks are used (similar to other NPT)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When you receive data and want to check for correlation, the VERY FIRST STEP you should do

A

Construct scatter plot and roughly scan for linear relationship

This is to check whether assumption that variables have linear relationship, before quantifying their linearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Distinguish between correlation and simple linear regression (SLiR)

A
  • Correlation: Find out how linear x and y are, provided their relation is already linear from scatter plot. No defined independent/dependent variable is defined yet
  • SLiR: Provided that correlation is SIGNIFICANT, give BEST-FIT LINE for DEFINED x and y (defined independent/dependent variable)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Purpose of SLiR

A

Estimate y for defined x using equation obtained from best fit line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

One disadvantage of SLiR

A

Not suitable for extrapolation. Equation only applies WITHIN data range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The equation for SLiR and what do each symbol mean

A

y = a + Bx

y: Dependent variable
x: Independent variable
a: y-intercept
B: Slope. i.e. change in MEAN of y that correspond to one unit change in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assumptions of SLiR

A
  1. Assume variables have linear relationship
  2. Observations are independent
  3. For any values of x, y is NORMALLY distributed
  4. Fo any x, variances are equal (similar to other tests)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does SLiR get its line of best fit?

A

Method of least squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Hypotheses of SLiR and tail?

A

H0: No effect by x on y (B = 0)
H1: B ≠ 0
- ALWAYS two-tailed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Given B = 1.657, alpha = 23.811, x is Weight, and y is systolic blood pressure (SBP), p = 0.001, construct the regression equation and formulate a conclusion.

A

y = 23.811 + 1.657(BW)

Conclusion:

  • For every 1kg increase in BW, the MEAN SBP increases by 1.657 mmHg.
  • At a sig level of 0.05, there is a statsig effect of BW on SBP (p = 0.001)

(rmb both word explanation of equation and sig level)
(rmb units)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is R2? What does it mean if:
R2 = 1
R2 = 0

A

Proportion of variability among observed y values that is explained by linear regression of x & y

R2 = 1: All pts lie on line
R2 = 0: No pts lie on line
17
Q

When data is obtained (for one dependent and one independent variable), what is the proper step to get linear equation if you suspect linear relationship?

A
  1. Construct scatter plot and scan for linear relationship
  2. If linear: Use Corr. analysis to check whether linearity is statsig
  3. If statsig, proceed to use SLiR to obtain equation and R2
18
Q

Distinguish between SLiR and multiple linear regression (MLiR)

A

MLiR: extension of SLiR describing relationship between dep var. and MORE THAN ONE INDEP VAR.

19
Q

Assumptions of MLiR

A
  1. Observations are independent
  2. For any x, distribution of y is normal
  3. For any SET of values x, variance is constant
  4. There is LITTLE OR NO MULTICOLLINEARITY among all indep var.
20
Q

What does Bi represent in MLiR

A

Change in mean value of y that corresponds to one-unit change in xi, AFTER controlling for all other indep var. (i.e. keeping all other values constant)

21
Q

Distinguish the purpose between adjusted R2 and R2

A
  • Adjusted R2: Used to compare between models that has different number of indep variables as it compensates for complexity
    E.g. MLiR vs SLiR regression
  • Normal R2: the definition
22
Q

Purpose of dummy variables

A

Using NUMBERS to identify categories of nominal variables (coz MLiR can only take numbers)

23
Q

Given:

  • Data collected: BMI at f/u, Baseline BMI
  • Interventions: two different dosage of drugs (1 and 2 dummy-coded)
  • B1 = -2.064, p = 0.06
  • B2 = -1.941, p = 0.005
  • B3 = 0.984, p = 0.0442
  • a = 0.428

State the MLiR equation and also explain what do each variable mean

A

y = 0.428 - 2.064 (Dose1) - 1.941 (Dose 2) + 0.984 (Baseline BMI)

  • B1: The Mean BMI@f/u btwn ctrl and dose 1 grp is 2.064 kg/m2 smaller than that of ctrl AFTER controlling for BASELINE BMI
  • B2: The Mean BMI@f/u btwn ctrl and dose 1 grp is 1.941 kg/m2 smaller than that of ctrl AFTER controlling for BASELINE BMI
  • B3: For every 1 kg/m2 increase in Baseline BMI, mean BMI@f/u increase by 0.984 kg/m2, after controlling for tx grps (no make sense, hence not impt)

At sig. level of 0.05, there is statsig assoc btwn tx and BMI@f/u AFTER ctrlling for basline BMI (as long as one p <0.05 of all the beta)

24
Q

Recommended max number of indep var. to analyse for MLiR

A

n/10, where n is sample size

25
Q

General meaning of B in MLiR, with a control group involved

A

The mean change/difference in y btwn control and x, AFTER controlling for baseline characteristics

26
Q

Three types of model in MLiR

A
  1. Enter: All indep var. entered into equation, good for small set of predictors
  2. Fwd selection: Begin with empty equation and add one at a time beginning with highest corr. first. Once in, variable remains
  3. Backward elimination: reverse of fwd, PREFERRED over fwd selection. (var. removed if they don’t contribute to regression equation)
27
Q

What kind of data do Logistic regression (LoR) analyse?

A
  • Dependent: dichotomous nominal variable

- Independent: ≥1 continuous/ordinal/normal variable

28
Q

General Equation for logistic regression

A

loge(O) = a + Bx

O = Outcome
x = exposure
29
Q

What is odds ratio? Mathematically, what is it represented by?

A

Measure the STRENGTH of association between E and O

Represented by e^B (obtained from LoR equation

30
Q

General expression of OR in words, given that OR = 1.1

A

Those who are E (exposed) have 1.1 times the odds (or 10% more likely) of developing O compared with those who are uE (unexposed)

31
Q

What does OR equal to when there is no assoc between E and O

A

OR = 1

32
Q

General Equation to calculate OR. How to calculate from 2x2 table?

A

OR = Odds that case was exposed/ Odds that ctrl was exposed

From 2x2 table, take quotient of cross products
i.e. ad/bc

33
Q

Assumptions for LoR

both SLoR and MLoR

A
  1. Dependent variable should be dichotomous
  2. Observations are independent
  3. There is linear relationship between independent variable and loge(O)
  4. MLoR: Litte or no multicollinearity among independent variables
34
Q

State the Hypotheses for SLoR and MLoR

A

SLoR:

  • H0: OR = 1
  • H1: OR ≠ 1

MLoR

  • H0: ORi = 1, after controlling for all other variables
  • H1: ORi ≠ 1, after controlling for all other variables
35
Q

Given MLoR was carried out:

  • Exposed to drug: p = 0.031, Exp(B) = 2.192, 95%CI = 1.053-4.567
  • Gender: p = 0.240, Exp(B) = 0.665, 95%CI = 0.338-1.312
  • Outcome of interest: Side effect

State whether the OR for E is adjusted or crude. Formulate a conclusion

A
  • OR is ADJUSTED odds ratio (since there is another variable)
  • Conclusion: Subjects who were E to drug had 2.19 times (95% CI: 1.05 - 4.57) the odds of developing the SE compared to those who were not exposed, AFTER controlling for gender (p = 0.031)
36
Q

In what kind of studies is OR most likely used?

A

Case-control studies (CCS), and maybe cross-sectional studies (XSS)