Biostats test 4 Flashcards

1
Q

What is binary logistic regression

A

prediction of a binary-valued DV on the basis of other variables, so response variable is not continuous, but binary-valued, as in:

yes/no, has/does not have, alive/dead, increased/decreased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

values of outcome variable

A

failure (coded 0) or success (coded 1)

success means “has the property”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what do we try to predict with binary logistic regression

A

the probability of succes (DV = 1) on the outcome variable as a function of covariates: p(success) = f(cov1, cov2, …)

so probability of success is a function of covariates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Assumptions of binary logistic

A
  • categories must be mutually exclusive (no overlap) and collectively exhaustive (all cases can be assigned)
  • if so, for all cases, success or failure can be coded in the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Link function

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Similarities of BLR with OLS

A
  • model building and its issues (colinearity, order of entry, influential cases)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Dissimilarities of BLR with OLS

A
  • DV is binary (categorical), not continuous
  • Interpretation of coefficients
  • Assessment of model fit/quality of obtained model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we ensure the [0,1] restircuted outcome range for the predicted values

A
  • link function (logit) is used to relate the linear model part to the outcome variable
  • it transforms the predicted values so that the outcomes are restrained to fall in the meaningful 0 to 1 range
  • Regression techniques that make use of some kind of link function are called Generalized Linear Models (GLM)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The logit function

A
  • Natural logarithm of odds
  • Logit = ln(odds) = ln(y hat/1-y hat)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The logistic regression model when combined with the logit (link) function

A

ln(y hat/1-y hat) = b0 + b1X1 + b2X2 + …

So the difference with OLS model is on the dependent variable side of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In OLS, rather than looking at the B coefficients themsleves, we look at

A
  • ODDS RATIO = e to the bower of b, where e is the base of natural log ln
  • change in the odds (of success) for a one-unit change in the predictor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Wald test

A

Gives the p-value of the odds ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Under H0, odds ration is

A
  • Odds ratio = 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Multiplicative effect

A

The combined effect of predictors on the DV is a product of separate effects, so the effects of odds ratios multiply in binary logistic regression, while they add up in ordinary linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

From probabilities to classification - what is classification of cases based on?

A

The predicted probabilities for success. The default setting for classification as ‘success’ is a predicted probability for success > .50.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

confusion matrix

A

the confusion matrix in your output gives predicted versus observed successes and failures, based on the cut-value

17
Q

Classification errors

A

false positives and false negatives

18
Q

The null model

A

starting point in the model building, a model without any actual predictors

if have to predict without knowing anything about predictors, the LARGEST CATEGORY WINS!

19
Q

How to decide which covariates to include

A

in the end, need to classify cases as success or fail cases - two groups

20
Q

Received operating characteristic

A

a plot that illustrates the performance of a binary classifier system for different discrimination methods

typically gives true positive rate against the false positive rate at various cut value settings

allows to see how diff cut value settings affect that classification results of your classifier

lenient threshold (low cut value): sensitivity is up!

strict threshold (high cut value): specificity is up!

Optimum typically: combination of high sensitivity and high specificiity

21
Q

ROC curve

A

area under curve is overall indicator of diagnostic accuracyw

22
Q

what does (P/1-P) in the logistic regression model represent

A

It represents the ODDS of the event happening

23
Q

what data does logistic regression model use to calculate probabilities, eg in the height and gender example and how does it do it

A

Data it uses:
- predictor variable (x)
- binary dependent vairble

How it uses it:

  1. during training the model fits the data to the logit = B0 + B1X equation, and learns B0 and B1 by estimating
    how X changes with Y
  2. uses predictor variables to calculate log odds (by inputting the predictor into the logistic regression equation), then converts them into probabilities using log transformations
  3. uses the cutoff to classify the observation into one of the two outcomes
24
Q

odds ratios

A

change in the odds (of success) for a one-unit change in the predictor

25
Q

under H0, what is the odds ratio for logistic regression model

A

1

26
Q

survival analysis is about

A

analyzing time-to-event data

27
Q

examples of descriptive research questions that would warrant survivial data

A

median survival time in clinical studies

how does the probability for an event change over time?

28
Q

examples of inferential research questions that would warrant survivial data

A
  • What variables explain the time to event best?
  • Do they shorten or lengthen the expected time?
  • By how much?
29
Q

mortality

A

refers to dropout in data

30
Q

censored data

A

when you have partial information: if a case drops out, you do not know when if ever, the endpoint was reached. survival analysis is developed to take censored data into account.

31
Q

right-censored

A

event not experienced at the end of the study - might or might not occur later, we don’t know

32
Q

left censored

A

event occufrs during study, but starting point lies before start of study, so exact time interval is not known

33
Q

interval censored

A

event occurs during study, but not exactly known when

34
Q

non-censored data

A

start time is known, end point of interest reached during study

35
Q

non-informative sensoring

A

censoring is not related to the likelihood of developing the event of interest

36
Q

hazard rate

A

the probability for event to occur in a next (small) time interval, assuming the event has not yet occured. like instantaneous risk

37
Q

survival rate

A

cumulative probability for non-event for a certain amount of time after some starting point

38
Q
A