25. Correlation, Regression, and Survival Analysis Flashcards

1
Q

what is the statistic for a linear relation between two variables?

A

correlation coefficient (value is between -1 and +1)

  • the closer the correlation coefficient is to +/-1, the stronger the linear relationship (less scatter)
  • correlation coefficient of 0 indicates no linear relationship
  • positive value means variables move in same direction
  • negative value means variables move in opposite directions
  • if excel gives an R^2 value for the correlation of data to the linear line, the correlation coefficient is actually R
  • does NOT tell you what the relationship is - doesn’t tell you how much one variable changes when the other is chanced
  • when the correlation coefficient is zero, two variables may still have some (non-linear) relationships
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the overall use of regression analysis?

A

association between to variables and to control for confounding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

regression quantifies what?

A

association between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

regression adjusts for what?

A

confounding variables (if measured)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

two main goals of regression analysis?

A
  • causal analysis: identification of risk factors “causation” (in the presence of /adjusting for other variables, beware that causality is often difficult or impossible to establish)
  • predict outcome with independent variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

linear regression tests for what?
logistic regression tests for what?
cox proportional hazards regression tests for what?

A

linear regression tests for a linear relationship between 2+ variables (measurement data)

logistic regression tests for a relationship between various covariates and the proportion of cases with given characteristics (eg dichotomous/categorical data)

cox proportional hazards regression tests for the effect of covariates on the time to an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

simple linear regression

A

exactly the same as coefficient of correlation

  • one independent variable
  • relationship between X and Y is described by a linear function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how is the linear regression model fit?

A

least squares criterion

  • the difference of the observed and predicted value of Y for each observed value of X is MINIMIZED
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

interpretation of slope and intercept for simple linear regression? statistical significance?

A
  • slope: estimated change in Y, with one unit change in X (Beta 1)
  • intercept is the estimated value of Y when X= 0
  • statistical significance determined by HYPOTHESIS TESTING using a t-test for slope (Beta 1)
    - null and alternative hypothesis: if Beta1 = 0, there is no linear relationship
    - if Beta1 doesn’t equal 0, then a linear relationship does exist
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

if we have more than one variable in the regression model, the model gives you what?

A

the effect of changes in each variable, while it holds the others constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

multiple (or multivariate) linear regression

A
  • relation between a continuous outcome variable and a set of independent variables
  • the regression coefficient for each X represents the amount by which Y changes on average when X changes by 1 unit and all of the other X’s remain the same
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

logistic regression - when do you use it?

A

use to examine an association (and control for confounding) for dichotomous outcomes ( model using a logistic transformation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

censored data

A

includes loss to follow-up or not having presence of event observed by the end of the study (information is thus limited by the amount of time participants spent in the study)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

person-time analysis (how to, and limitations)

A
  • find the number of incidences / person-time = # recurrences per person-time (eg 3 incidences in 15 person-months = 0.2 recurrences per person-month)
  • these are called INCIDENCE DENSITY RATES
  • divide one rate by the other to obtain incidence rate ratio (IRR) (eg those eating more eggs have a 1,2 times rate of developing high cholesterol)..can get a P-value and CI on that.
  • limitations: rate is constant over time, so the risk for seeing the outcome is the same for one person followed for 2 months as for two people followed for one month
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

time to event (SURVIVAL) analysis

A
  • analyses that include the time to an event occurrence (outcome of interest is not always death)
  • display the data using survival curves (plot the proportion of individuals at risk and free from outcome (survivors) at that time)
  • analyze data and adjust for confounding (cox proportional hazards regression)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

kaplan-meier plots

A
  • display the cumulative survival
  • if we instead plot (1-survival) we would get a cumulative incidence plot
  • RR at any time point is the ratio of the cumulative incidence of the two curves
17
Q

cox proportional hazards regression

A
  • outcome is TIME FROM AN INITIAL OBSERVATION TO SUBSEQUENT EVENT
  • goal: examine risk factors or predictors of time to event
  • dependent (Y) variable is the HAZARD FUNCTION at a given time (the probability that an individual will experience an event at time t, given that the individual survived until that time t)
  • hazard is the rate at which events occur (assumes some baseline hazard)
  • ratio of rates at which subjects in two groups are experiencing events
18
Q

RR vs Hazard ratio

A
  • RR tells you whether events are more or less likely to occur in one group vs another
  • hazard ratio tells you that an event is likely to occur “faster” or s”slower” in one group vs another
19
Q

what type of statistical analysis would you use? : the association between duration of sleep the night before and the likelihood of passing a test, among medical students?

A

logistic regression (since outcome is dichotomous)

20
Q

what type of statistical analysis would you use? : the association between duration of sleep and caloric intake in medical students?

A

linear regression, or correlation

21
Q

what type of statistical analysis would you use? : the association between duration of sleep and caloric intake in medical students, controlling for gender?

A

multivariable linear regression

22
Q

what type of statistical analysis would you use? : compaing male and female medical students over a 10-year period after graduation from medical school to assess whether gender is association with career switch?

A

kaplan meier plot (timed event in which time is of interest)

23
Q

the type of outcome variable helps you choose the type of regression analysis:
for continuous?
for dichotomous?
for time-to-event?

A

continuous: linear
dichotomous: logistic

time-to-event: survival curve, proportional hazards