Quiz 2 Flashcards

1
Q

Six key regression assumptions

A
Model linear in the parameters
Random sampling
No perfect collinearity
Zero conditional mean assumption
Homoscedasticity
Errors independent of covariates and normally distributed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to evaluate goodness of fit in Poisson regression?

A

Use deviance (-2 log likelihood fitted/likelihood saturated)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to do Poisson regression for rates?

A

Replace the mean count with lambda_i * t_i (the rate times the amount of time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Assumptions for Poisson regression for rates

A

For rates: we assume the rate is constant over time within an individual
For rates from summary data: in addition, we assume that each individual in a group follows a Poisson distribution with the same mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we do Poisson regression with standardized rates?

A

Calculate an expected count for each group based on the standardization variables, and use this as an offset term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we assess overdispersion in Poisson regression?

A

Use standardized residuals, which should be appx independent with mean 0 and variance 1. Can evaluate plot or use a test statistic where the sum of squared residuals is chi square distributed with df = n-p-1. To estimate the magnitude of overdispersion, we use the sum of this test statistic divided by n-p-1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a problem with the log binomial model?

A

Because it is non-canonical, it is less stable than the logistic binomial model and may fail to converge, in which case you can fit a Poisson model with robust variance estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Assumption for non-parametric survival analysis

A

Non-informative censoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to calculate the survivor function

A

Multiply 1-h(t_j) for all periods up to the one being considered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What happens to the mean survival time if the last observation is censored?

A

Results in an underestimate of the true mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to compare survival at a fixed time point?

A

Examine whether the survival curves for that time point have overlapping confidence intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Kaplan-Meier estimator used for?

A

Used to estimate s(t) in the presence of censoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the log-rank test used for?

A

Used to compare survival across the entire distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When do we need a Cox proportional hazards model?

A

When we have more than a single binary covariate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the baseline hazard in a Cox model represent?

A

The underlying hazard when all covariates are equal to zero (not estimated)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the distinction for the likelihood estimation procedure used in a Cox model?

A

Partial maximum likelihood estimation – conditioned on the observed event times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What assumptions do we need for the Cox model?

A

Proportionality of hazards
Linear relationship between continuous covariates and log hazard
Non-informative censoring

18
Q

What happens with ties in a Cox model?

A

If censoring and event times coincide, the event is assumed to come first. Otherwise, there are several methods, with exact best for small numbers of ties, and Efron’s method preferable with many ties.

19
Q

What is the difference between MSE and the variance of the sampling distribution?

A

The variance of the sampling distribution is relative to the expected value of the estimator, while the MSE is relative to the true population value of the parameter. They are equal when there is no bias.

20
Q

How do we quantify reliability?

A

In terms of estimator’s sampling variance/standard error

21
Q

How do we quantify validity?

A

In terms of the bias of the estimator (difference between expected value and population true value)

22
Q

How do we quantify accuracy?

A

Using MSE/RMSE

23
Q

How many possible samples of size n can come from a group of N elements?

A

(N n) = N!/(n!(N-n)!

24
Q

When do we use the finite population correction? What happens if we do not?

A

When calculating the sampling variance in a simple random sample. We multiply the regular sampling variance formula by (1-f) where f is the sampling fraction, n/N. If we do not use the correction, the standard error of the parameter estimate will be overestimated.

25
Q

How does using stratified random sampling with proportional allocation change the estimated sampling variance?

A

The estimated sampling variance will be slightly smaller than for simple random sampling.

26
Q

What is the design effect?

A

(1+(M-1)rho), where M is the cluster size. This term is added to the variance of the mean estimator in a cluster sample to account for correlation within clusters.

27
Q

How do we make a cluster sample representative of a population?

A

We use weights with each element weighted as the inverse of its probability of selection

28
Q

What is the definition of reliability under the parallel test assumptions?

A

Ratio of true score variance to observed score variance

29
Q

What are the parallel test assumptions?

A

1) Each item’s relationship to the latent variable is identical
2) Errors for each item are uncorrelated with each other and with the true score
3) The amount of error in each item is identical

30
Q

What are tau equivalent and essentially tau equivalent tests?

A

Tau equivalent tests assume identical true scores for each item. Essentially tau equivalent tests assume that true scores only differ by a constant.

31
Q

What are congeneric tests?

A

All items share a common latent variable, but do not need equal error variances or equally strong relationships to the latent variable

32
Q

What does random, non-differential measurement error do to beta estimates, and why?

A

The estimates are attenuated towards the null because the denominator of the beta estimate formula (the variance part) now has a variance of error term that is greater than zero.

33
Q

What is the attenuation factor?

A

This is the ratio of true score variance to observed variance, which is the definition of reliability.

34
Q

What are five ways to quantify reliability?

A

1) rxx for parallel tests
2) Split halves rxx
3) Internal consistency - Cronbach’s alpha
4) Inter-rater reliability - Cohen’s kappa; Cronbach’s alpha
5) Test-retest reliability

35
Q

What assumptions are required for rxx?

A

Parallel test assumptions

36
Q

How do we interpret the correlation between parallel tests?

A

This is the squared correlation of each test with the true score: the percent of test variance that is true score variance

37
Q

What does the covariance matrix tell us about rxx?

A

We can express the ratio of joint variation to total variation as the sum of the off-diagonal elements divided by the total scale variance (sum of all elements), multiplied by a correction of k/(k+1).

38
Q

What formula is used for split halves reliability?

A

Spearman-Brown formula: alpha = (krbar)/(1+(k-1)rbar)

where k is the number of items in the whole scale and rbar is the average correlation across items

39
Q

How do we quantify internal consistency and what assumptions do we make?

A

Quantified by Cronbach’s alpha; assumes we are measuring a single underlying latent variable

40
Q

How do we quantify inter-rater reliability?

A

Cronbach’s alpha, or Cohen’s kappa, k=(P_o-P_e)/(1-P_e)