Quiz 2 Flashcards
Six key regression assumptions
Model linear in the parameters Random sampling No perfect collinearity Zero conditional mean assumption Homoscedasticity Errors independent of covariates and normally distributed
How to evaluate goodness of fit in Poisson regression?
Use deviance (-2 log likelihood fitted/likelihood saturated)
How to do Poisson regression for rates?
Replace the mean count with lambda_i * t_i (the rate times the amount of time)
Assumptions for Poisson regression for rates
For rates: we assume the rate is constant over time within an individual
For rates from summary data: in addition, we assume that each individual in a group follows a Poisson distribution with the same mean
How do we do Poisson regression with standardized rates?
Calculate an expected count for each group based on the standardization variables, and use this as an offset term
How do we assess overdispersion in Poisson regression?
Use standardized residuals, which should be appx independent with mean 0 and variance 1. Can evaluate plot or use a test statistic where the sum of squared residuals is chi square distributed with df = n-p-1. To estimate the magnitude of overdispersion, we use the sum of this test statistic divided by n-p-1.
What is a problem with the log binomial model?
Because it is non-canonical, it is less stable than the logistic binomial model and may fail to converge, in which case you can fit a Poisson model with robust variance estimates.
Assumption for non-parametric survival analysis
Non-informative censoring
How to calculate the survivor function
Multiply 1-h(t_j) for all periods up to the one being considered
What happens to the mean survival time if the last observation is censored?
Results in an underestimate of the true mean
How to compare survival at a fixed time point?
Examine whether the survival curves for that time point have overlapping confidence intervals
What is the Kaplan-Meier estimator used for?
Used to estimate s(t) in the presence of censoring
What is the log-rank test used for?
Used to compare survival across the entire distribution
When do we need a Cox proportional hazards model?
When we have more than a single binary covariate
What does the baseline hazard in a Cox model represent?
The underlying hazard when all covariates are equal to zero (not estimated)
What is the distinction for the likelihood estimation procedure used in a Cox model?
Partial maximum likelihood estimation – conditioned on the observed event times
What assumptions do we need for the Cox model?
Proportionality of hazards
Linear relationship between continuous covariates and log hazard
Non-informative censoring
What happens with ties in a Cox model?
If censoring and event times coincide, the event is assumed to come first. Otherwise, there are several methods, with exact best for small numbers of ties, and Efron’s method preferable with many ties.
What is the difference between MSE and the variance of the sampling distribution?
The variance of the sampling distribution is relative to the expected value of the estimator, while the MSE is relative to the true population value of the parameter. They are equal when there is no bias.
How do we quantify reliability?
In terms of estimator’s sampling variance/standard error
How do we quantify validity?
In terms of the bias of the estimator (difference between expected value and population true value)
How do we quantify accuracy?
Using MSE/RMSE
How many possible samples of size n can come from a group of N elements?
(N n) = N!/(n!(N-n)!
When do we use the finite population correction? What happens if we do not?
When calculating the sampling variance in a simple random sample. We multiply the regular sampling variance formula by (1-f) where f is the sampling fraction, n/N. If we do not use the correction, the standard error of the parameter estimate will be overestimated.