Biostats test 4 Flashcards

(65 cards)

1
Q

What is binary logistic regression

A

prediction of a binary-valued DV on the basis of other variables, so response variable is not continuous, but binary-valued, as in:

yes/no, has/does not have, alive/dead, increased/decreased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

values of outcome variable

A

failure (coded 0) or success (coded 1)

success means “has the property”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what do we try to predict with binary logistic regression

A

the probability of success as a function of covariates

p(success) = f(cov1, cov2, …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Assumptions of binary logistic

A
  • categories must be mutually exclusive (no overlap) and collectively exhaustive (all cases can be assigned)
  • if so, for all cases, success or failure can be coded in the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

in logistic regression, for predicted probabilities to be meaningful, they must…..

A

lie between the values of 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Assumptions of logistic regression

A

categories must be mutually exclusive (no overlap) and collectively exhaustive (all categories can be assigned)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Link function

A

transforms the dependent variable so outcome range can be restricted to be between 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Similarities of BLR with OLS

A
  • model building and its issues (colinearity, order of entry, influential cases)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Dissimilarities of BLR with OLS

A
  • DV is binary (categorical), not continuous
  • Interpretation of coefficients
  • Assessment of model fit/quality of obtained model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we ensure the [0,1] restircuted outcome range for the predicted values

A
  • link function (logit) is used to relate the linear model part to the outcome variable
  • it transforms the predicted values so that the outcomes are restrained to fall in the meaningful 0 to 1 range
  • Regression techniques that make use of some kind of link function are called Generalized Linear Models (GLM)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The logit function

A
  • Natural logarithm of odds
  • Logit = ln(odds) = ln(y hat/1-y hat)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The logistic regression model when combined with the logit (link) function

A

ln(y hat/1-y hat) = b0 + b1X1 + b2X2 + …

So the difference with OLS model is on the dependent variable side of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In Logistic Regression, rather than looking at the B coefficients themsleves, we look at

A
  • ODDS RATIO = e to the bower of b, where e is the base of natural log ln
  • change in the odds (of success) for a one-unit change in the predictor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Wald test

A

Gives the p-value of the odds ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Under H0, odds ration is

A
  • Odds ratio = 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What if we want to calculate the odds ratio for a non-unit size

A

multiply regression coefficient by the size before you raise e to the power of the coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Multiplicative effect

A

The combined effect of predictors on the DV is a product of separate effects, so the effects of odds ratios multiply in binary logistic regression, while they add up in ordinary linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

From probabilities to classification - what is classification of cases based on?

A

The predicted probabilities for success. The default setting for classification as ‘success’ is a predicted probability for success > .50.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

confusion matrix

A

the confusion matrix in your output gives predicted versus observed successes and failures, based on the cut-value (always think of a 2x2 table).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Classification errors

A

false positives and false negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The null model

A

starting point in the model building, a model without any actual predictors

if have to predict without knowing anything about predictors, the LARGEST CATEGORY WINS!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How to decide which covariates to include

A

in the end, need to classify cases as success or fail cases - two groups

Suppose you have a continuous covariate and success and fail groups differ on mean → covariate may help to classify as success or fail

Suppose you have categorical covariates, and success and fail groups have different distributions. Chi-square test of homogeneity will be significant → categorical covariate may help to classify as success or fail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Receiver operating characteristic curve

A

a plot that illustrates the performance of a binary classifier system for different discrimination methods

typically gives true positive rate (specificity) against the false positive (1 - sensitivity) rate at various cut value settings

allows to see how diff cut value settings affect that classification results of your classifier

lenient threshold (low cut value): sensitivity is up!

strict threshold (high cut value): specificity is up!

Optimum typically: combination of high sensitivity and high specificiity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A lenient (low) cut value leads to

A

higher sensitivity! Success easily detected (at the expense of increased false positives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
A strict (high) cut values leads to
higher specificity! Good at keeping false positives down, but less sensitive
26
ROC curve
area under curve is overall indicator of diagnostic accuracyw
27
what does (P/1-P) in the logistic regression model represent
It represents the ODDS of the event happening
28
what data does logistic regression model use to calculate probabilities, eg in the height and gender example and how does it do it
Data it uses: - predictor variable (x) - binary dependent vairble How it uses it: 1. during training the model fits the data to the logit = B0 + B1X equation, and learns B0 and B1 by estimating how X changes with Y 2. uses predictor variables to calculate log odds (by inputting the predictor into the logistic regression equation), then converts them into probabilities using log transformations 3. uses the cutoff to classify the observation into one of the two outcomes
29
odds ratios
change in the odds (of success) for a one-unit change in the predictor
30
under H0, what is the odds ratio for logistic regression model
1
31
survival analysis is about
analyzing time-to-event data
32
examples of descriptive research questions that would warrant survivial data
median survival time in clinical studies how does the probability for an event change over time?
33
examples of inferential research questions that would warrant survivial data
- What variables explain the time to event best? - Do they shorten or lengthen the expected time? - By how much?
34
mortality
refers to dropout in data
35
censored data
when you have partial information: if a case drops out, you do not know when if ever, the endpoint was reached. survival analysis is developed to take censored data into account.
36
right-censored
event not experienced at the end of the study - might or might not occur later, we don't know
37
left censored
event occufrs during study, but starting point lies before start of study, so exact time interval is not known
38
interval censored
event occurs during study, but not exactly known when
39
non-censored data
start time is known, end point of interest reached during study
40
non-informative sensoring
censoring is not related to the likelihood of developing the event of interest
41
hazard rate
the probability for event to occur in a next (small) time interval, assuming the event has not yet occured. like instantaneous risk
42
survival rate
cumulative probability for non-event for a certain amount of time after some starting point
43
Mortality
Subjects are lost to follow-up (drop out) before the end of the study so you do not know if/when the end point was ever reached. this is one of the reasons for not being able to use logistic regression for time-to-event data.
44
censored data
incomplete time to event data
45
right censored cases
event not experienced at the end of the study
It might or might not occur later –we don’t know could be dropout too
46
left censored cases
event occurs during study, but starting point lies before start of study, exact time interval not known
47
interval censored cases
event occurs during study, but not exactly known when
47
non-censored data
start time is known, end point of interest reached during study
48
non-informative censoring
when in survival regression, we assume that censoring is not related to the likelihood of developing the event of interest and that that subjects whose data are censored would have the same distribution of times to event, had they actually been observed
49
hazard rate
instantaneous risk: the probability that if case survived to time t, event will be experienced in the next time interval t + Δt
50
survival rate
cumulative probability for the non-event for a certain amount of time after some starting
51
median survival time
time to event on average
52
Describing survival data using life tables
- Break down range of survival times into smaller time slices - Tabulate the counts of all relevant events (including censored data) per time slice - Probabilities for ‘at risk’, of ‘dying’, and of ‘surviving’ can be computed on the basis of these counts - look at cumulative survival data per time slice
53
Kaplan-Meier life table
- every case has its own row in data file - For each case, information about the survival time and censoring is entered - The resulting curves are more smooth than for grouped life tables, as time is specified per subject, not per (fixed) time slice
54
How to get around confounders playing a role in survival regression
If confounders play a role, life tables can present a misleading picture. However, you can create survival curves for different categories and compare these.
55
what is the log rank test used for
comparing survival times for different groups
56
how does the log rank test work
it computes scaled difference between observed and expected number of events per time slice, which are then combined
57
what is cox proportional hazard regression used for
determining whether there is a significant relationship between one or more covariates and hazard, quantifying and testing these relationships, and generating a prognosis curve
58
dependent variable of the cox proportional hazard regression
ln(hazard)
59
prognosis curve
a predicted survival curve customized for any specific combination of covariate values
60
Building cox regression model
1. baseline curve of mean values for all covariates 2. model coefficients determined (if greater than 1, hazard is up and survival is down)
61
what is h0 in ln (h) = ln(h0) + b1X1 + b2X2 + … bkXk

baseline hazard - overall shape of curve
62
calculating h from ln (h) = ln(h0) + b1X1 + b2X2 + … bkXk

h = h0 x e^(b1X1 + b2X2 + ….) So hazard is baseline hazard multiplied with exponential of model
63
hazard ratio (relative risk) explanation and interpretation
Exp(b) or e to the power of b relative risk of experiencing the event (e.g., failure, death) between two groups or for a one-unit increase in a predictor.
64
Proportional hazard assumption
Effects of covariates on the likelihood for the event are assumed constant over time, so ratio of hazards is constant (proportional) over time