1. Assume variables have linear relationship 2. Observations are independent 3. For any values of x, y is NORMALLY distributed 4. Fo any x, variances are equal (similar to other tests)

L11 to L13: Correlation and Regression Flashcards by Zhen Heng Lim

Define Correlation

ASSUMING that relationship is linear, it QUANTIFIES degree to which 2 random variables are related

How well did you know this?

Not at all

Perfectly

What is correlation coefficient (R)?

QUANTITATIVE measure of the STRENGTH and DIRECTION of a linear relationship between two variables

How well did you know this?

Not at all

Perfectly

Two types of correlation analysis, and when to use them?

Pearson product-moment correlation (PPMC): Parametric test used when variables are continuous
Spearman rank Correlation (SRC): NPT, when ≥1 are non-normal distribution, OR ordinal data

How well did you know this?

Not at all

Perfectly

Assumptions of correlation analyses

x and y independent
Pairs of observations (x,y) are randomly selected
PPMC: underlying ppn of both variables are normally variables

How well did you know this?

Not at all

Perfectly

Hyp of correlation analysis

H0: r = 0 (no correlation)
H1: r ≠0 (have correlation) or r>0 or r<0

How well did you know this?

Not at all

Perfectly

Advantage of SRC over PPMC

Decreased sensitivty to outliers since ranks are used (similar to other NPT)

How well did you know this?

Not at all

Perfectly

When you receive data and want to check for correlation, the VERY FIRST STEP you should do

Construct scatter plot and roughly scan for linear relationship

This is to check whether assumption that variables have linear relationship, before quantifying their linearity

How well did you know this?

Not at all

Perfectly

Distinguish between correlation and simple linear regression (SLiR)

Correlation: Find out how linear x and y are, provided their relation is already linear from scatter plot. No defined independent/dependent variable is defined yet
SLiR: Provided that correlation is SIGNIFICANT, give BEST-FIT LINE for DEFINED x and y (defined independent/dependent variable)

How well did you know this?

Not at all

Perfectly

Purpose of SLiR

Estimate y for defined x using equation obtained from best fit line

How well did you know this?

Not at all

Perfectly

One disadvantage of SLiR

Not suitable for extrapolation. Equation only applies WITHIN data range

How well did you know this?

Not at all

Perfectly

The equation for SLiR and what do each symbol mean

y = a + Bx

y: Dependent variable
x: Independent variable
a: y-intercept
B: Slope. i.e. change in MEAN of y that correspond to one unit change in x

How well did you know this?

Not at all

Perfectly

Assumptions of SLiR

Assume variables have linear relationship
Observations are independent
For any values of x, y is NORMALLY distributed
Fo any x, variances are equal (similar to other tests)

How well did you know this?

Not at all

Perfectly

How does SLiR get its line of best fit?

Method of least squares

How well did you know this?

Not at all

Perfectly

Hypotheses of SLiR and tail?

H0: No effect by x on y (B = 0)
H1: B ≠ 0
- ALWAYS two-tailed

How well did you know this?

Not at all

Perfectly

Given B = 1.657, alpha = 23.811, x is Weight, and y is systolic blood pressure (SBP), p = 0.001, construct the regression equation and formulate a conclusion.

y = 23.811 + 1.657(BW)

Conclusion:

For every 1kg increase in BW, the MEAN SBP increases by 1.657 mmHg.
At a sig level of 0.05, there is a statsig effect of BW on SBP (p = 0.001)

(rmb both word explanation of equation and sig level)
(rmb units)

How well did you know this?

Not at all

Perfectly

What is R2? What does it mean if:
R2 = 1
R2 = 0

Study These Flashcards

Proportion of variability among observed y values that is explained by linear regression of x & y

R2 = 1: All pts lie on line
R2 = 0: No pts lie on line

When data is obtained (for one dependent and one independent variable), what is the proper step to get linear equation if you suspect linear relationship?

Study These Flashcards

Construct scatter plot and scan for linear relationship
If linear: Use Corr. analysis to check whether linearity is statsig
If statsig, proceed to use SLiR to obtain equation and R2

Distinguish between SLiR and multiple linear regression (MLiR)

Study These Flashcards

MLiR: extension of SLiR describing relationship between dep var. and MORE THAN ONE INDEP VAR.

Assumptions of MLiR

Study These Flashcards

Observations are independent
For any x, distribution of y is normal
For any SET of values x, variance is constant
There is LITTLE OR NO MULTICOLLINEARITY among all indep var.

What does Bi represent in MLiR

Study These Flashcards

Change in mean value of y that corresponds to one-unit change in xi, AFTER controlling for all other indep var. (i.e. keeping all other values constant)

Distinguish the purpose between adjusted R2 and R2

Study These Flashcards

Adjusted R2: Used to compare between models that has different number of indep variables as it compensates for complexity
E.g. MLiR vs SLiR regression
Normal R2: the definition

Purpose of dummy variables

Study These Flashcards

Using NUMBERS to identify categories of nominal variables (coz MLiR can only take numbers)

Given:

Data collected: BMI at f/u, Baseline BMI
Interventions: two different dosage of drugs (1 and 2 dummy-coded)
B1 = -2.064, p = 0.06
B2 = -1.941, p = 0.005
B3 = 0.984, p = 0.0442
a = 0.428

State the MLiR equation and also explain what do each variable mean

Study These Flashcards

y = 0.428 - 2.064 (Dose1) - 1.941 (Dose 2) + 0.984 (Baseline BMI)

B1: The Mean BMI@f/u btwn ctrl and dose 1 grp is 2.064 kg/m2 smaller than that of ctrl AFTER controlling for BASELINE BMI
B2: The Mean BMI@f/u btwn ctrl and dose 1 grp is 1.941 kg/m2 smaller than that of ctrl AFTER controlling for BASELINE BMI
B3: For every 1 kg/m2 increase in Baseline BMI, mean BMI@f/u increase by 0.984 kg/m2, after controlling for tx grps (no make sense, hence not impt)

At sig. level of 0.05, there is statsig assoc btwn tx and BMI@f/u AFTER ctrlling for basline BMI (as long as one p <0.05 of all the beta)

Recommended max number of indep var. to analyse for MLiR

Study These Flashcards

n/10, where n is sample size

General meaning of B in MLiR, with a control group involved

The mean change/difference in y btwn control and x, AFTER controlling for baseline characteristics

Three types of model in MLiR

1. Enter: All indep var. entered into equation, good for small set of predictors 2. Fwd selection: Begin with empty equation and add one at a time beginning with highest corr. first. Once in, variable remains 3. Backward elimination: reverse of fwd, PREFERRED over fwd selection. (var. removed if they don't contribute to regression equation)

What kind of data do Logistic regression (LoR) analyse?

- Dependent: dichotomous nominal variable | - Independent: ≥1 continuous/ordinal/normal variable

General Equation for logistic regression

loge(O) = a + Bx ``` O = Outcome x = exposure ```

What is odds ratio? Mathematically, what is it represented by?

Measure the STRENGTH of association between E and O Represented by e^B (obtained from LoR equation

General expression of OR in words, given that OR = 1.1

Those who are E (exposed) have 1.1 times the odds (or 10% more likely) of developing O compared with those who are uE (unexposed)

What does OR equal to when there is no assoc between E and O

OR = 1

General Equation to calculate OR. How to calculate from 2x2 table?

OR = Odds that case was exposed/ Odds that ctrl was exposed From 2x2 table, take quotient of cross products i.e. ad/bc

Assumptions for LoR | both SLoR and MLoR

1. Dependent variable should be dichotomous 2. Observations are independent 3. There is linear relationship between independent variable and loge(O) 4. MLoR: Litte or no multicollinearity among independent variables

State the Hypotheses for SLoR and MLoR

SLoR: - H0: OR = 1 - H1: OR ≠ 1 MLoR - H0: ORi = 1, after controlling for all other variables - H1: ORi ≠ 1, after controlling for all other variables

Given MLoR was carried out: - Exposed to drug: p = 0.031, Exp(B) = 2.192, 95%CI = 1.053-4.567 - Gender: p = 0.240, Exp(B) = 0.665, 95%CI = 0.338-1.312 - Outcome of interest: Side effect State whether the OR for E is adjusted or crude. Formulate a conclusion

- OR is ADJUSTED odds ratio (since there is another variable) - Conclusion: Subjects who were E to drug had 2.19 times (95% CI: 1.05 - 4.57) the odds of developing the SE compared to those who were not exposed, AFTER controlling for gender (p = 0.031)

In what kind of studies is OR most likely used?

Case-control studies (CCS), and maybe cross-sectional studies (XSS)

L11 to L13: Correlation and Regression Flashcards

(36 cards)