Lecture 8 and 9 (Correlation, Regression, CIs) Flashcards

1
Q

The dependent variable is on what axis?

A

Y-axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The independent variable is on what axis?

A

X-axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When should you use a correlation analysis?

A
  • examine relationship between variables
  • estimate strength of association between variables
  • when independent and dependent variables are not clearly different
  • when regression requirements not met
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A correlation coefficient of 0 means:

A
  • there is no association between the two variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A regression is:

A
  • how well data fits a line
  • r-value close to 0 = no correlation
  • r-value closer to 1 or -1 = high correlation
  • r-squared tells you the amount of variation in Y that is contributed by variation in X.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When should you use regression analysis?

A
  • look for a trend in data between variables
  • more than one X (independent) variable = multiple regression
  • predict a dependent variable
  • adjust for confounding variables
  • curve fitting (pharmacokinetics)
  • calibration and laboratory assays
  • detect patterns in microarray data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Regression r-value close to 0:

A

no association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Regression r-value close to 1:

A

strong association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Regression r-squared value tells you:

A
  • the amount of variation in Y that is contributed by variation in X.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Parametric test characteristics:

A
  • assume variables are normally distributed with equal variances
  • dependent on mean and variance
  • susceptible to outliers
  • requires continuous variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Non-parametric test characteristics:

A
  • based on ranks
  • distribution, variance, mean does not matter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

You can transform non-linear data to linear data by:

A
  • taking logs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Three ways you can control for outliers:

A
  1. using non-parametric test
  2. dropping the outlier(s)
  3. log transformation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Multivariate regression:

A
  • more than one X (independent) variable
  • allows adjustment for confounders
  • controls for variable interactions by multiplying variables together
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Stepwise regression:

A
  • finds the top contributing variable, then the second, then the third, etc. until a point of diminishing returns is reached.
    • a.k.a finds the group of variables that has the largest collective r-squared value.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Multiple logistic regression:

A
  • a multivariate analysis
  • adjusts for confounding
  • useful when outcome is dichotomous
  • provides a direct estimate of the ODDS RATIO for each independent variable
17
Q

When the distribution of your data is not normal, what type of test should you use?

A

non-parametric

18
Q

If you are analyzing more than one type of independent (X) variable, what type of analysis should you use?

A

multivariate regression

19
Q

Principal Component Analysis (PCA):

A
  • takes many variables and reduces them by regression
  • gives you groups of variables that best explain variation
20
Q

Risk factors are modfiable through:

A

primary prevention

21
Q

Prognostic factors are modifiable through:

A

secondary prevention

22
Q

Common prognosis endpoints:

A
  1. case fatality (patients with disease who die of it)
  2. disease-specific mortality (people per 10,000 who are dying of specific disease)
  3. response
  4. remission
  5. recurrence
23
Q

Equipoise:

A
  • a genuine lack of consensus in the medical community about a treatment or prognosis, and how to treat.
  • allows for RCTs
24
Q

Kaplan-Meier Analysis:

A
  • most widely used survival analysis:
  • a graph of time to event
  • every horizontal segment is a time period
  • every vertical drop is an event (death) or a dropout
  • larger the sample size, smoother the curve
25
Q

Kaplan-Meier analysis truncation:

A

when a patient enters the study after it has already started

26
Q

Kaplan-Meier analysis censoring:

A

when a patient drops out of a study after it has started

27
Q

Can a Kaplan-Meier analysis handle covariates?

A

No.

  • use a Cox regression for this
28
Q

Cox regression:

A
  • multivariate survival analysis
  • can control for other factors
  • calculates hazard ratio (same as relative risk)
29
Q

Equipoise allows for:

A
  • randomized control trials to occur
  • equipoise = uncertainty in the medical community
30
Q

Variance =

A

measure of the spread/dispersion of values around the mean.

31
Q

Standard deviation =

A

√v; (v = variance)

  • decreases as sample size increases
32
Q

Standard error of the mean (SEM) =

A

SD/ √n

33
Q

Central limit theorem posits:

A
  • larger the sample size, the closer the study mean is to the population mean
    • i.e. narrower confidence interval
34
Q

Interquartile range:

A
  • IQR contains 50% of the observations
    • (from the 25th - 75th percentile)
35
Q

Confidence intervals describe:

A
  • the uncertainty that surrounds a particular observation
  • larger the sample size, narrower the CI = MORE PRECISE STUDY
36
Q

Equation for 95% CI:

A

95% CI = mean +/- 1.96(SD/ √n)

  • SD = standard deviation
  • n = sample size
37
Q

For correlation analyses, the confidence interval cannot contain:

A

0

0 = no correlation

38
Q

For relative risk, hazard ratios, and odds ratios, the confidence interval cannot contain:

A

1