week 7 Flashcards by tyrion lannister

What is the goal of logistic regression?

To model the probability of a binary (0 or 1) response variable as a function of covariates.

How well did you know this?

Not at all

Perfectly

What probability distribution is assumed for the binary response variable Y_i in logistic regression?

Bernoulli distribution with success probability π_i, P(Y_i = y_i) = π_i^{y_i} (1-π_i)^{1-y_i}.

How well did you know this?

Not at all

Perfectly

What is the standard ‘link function’ used in logistic regression to connect the probability π_i to a linear combination of covariates x_i?

The logit function: logit(π_i) = log(π_i / (1-π_i)) = x_i^T θ.

How well did you know this?

Not at all

Perfectly

What function maps the linear predictor x_i^T θ back to the probability π_i?

The logistic (or sigmoid) function: π_i = σ(x_i^T θ) = 1 / (1 + exp(-x_i^T θ)).

How well did you know this?

Not at all

Perfectly

Write the log-likelihood function l(θ; X, y) for logistic regression.

l(θ; X, y) = Σ_{i=1}^n [y_i log(π_i) + (1-y_i) log(1-π_i)], where π_i = σ(x_i^T θ).

How well did you know this?

Not at all

Perfectly

Write the simplified log-likelihood function using u_i = exp(x_i^T θ).

l(θ; X, y) = Σ_{i=1}^n [y_i log(u_i) - log(1+u_i)].

How well did you know this?

Not at all

Perfectly

What is the score function S(θ) (gradient of the log-likelihood) for logistic regression?

S(θ) = ∇θ l(θ; X, y) = Σ{i=1}^n (y_i - π_i) x_i.

How well did you know this?

Not at all

Perfectly

How is the Maximum Likelihood Estimate (MLE) of θ typically found in logistic regression?

By numerically solving the system of equations S(θ) = Σ_{i=1}^n (y_i - π_i) x_i = 0, often using methods like Newton-Raphson.

How well did you know this?

Not at all

Perfectly

What is the observed information matrix i(θ) (negative Hessian of the log-likelihood) for logistic regression?

i(θ) = -∇²l(θ | X, y) = Σ_{i=1}^n π_i(1 - π_i) x_i x_i^T.

How well did you know this?

Not at all

Perfectly

What objective function is maximized in L2 regularized logistic regression?

l(θ | X, y) - λ θ^T θ (or equivalent forms).

How well did you know this?

Not at all

Perfectly

What is the function g(θ) being minimized if we define θ̃ = arg max[-g(θ)] for L2 regularization?

g(θ) = -l(θ | X, y) + λ θ^T θ.

How well did you know this?

Not at all

Perfectly

The L2 regularized logistic regression estimate θ̃ corresponds to what Bayesian point estimate?

The Maximum a Posteriori (MAP) estimate.

How well did you know this?

Not at all

Perfectly

What prior distribution on θ corresponds to L2 regularization with penalty λ θ^T θ?

A Gaussian prior: θ ~ N(0, σ²I), where σ² is related to 1/λ (specifically N(0, (1/(2λ))I) if the penalty is exactly λθ^Tθ).

How well did you know this?

Not at all

Perfectly

Let g(θ) = -log(π(θ|X, y)_kernel) for the posterior associated with L2 regularization. What is its gradient ∇_θ g(θ)?

∇θ g(θ) = -Σ{i=1}^n (y_i - π_i) x_i + λθ (assuming regularization (λ/2)θ^Tθ in the negative log posterior).

How well did you know this?

Not at all

Perfectly

Let g(θ) = -log(π(θ|X, y)_kernel). What is its Hessian H(θ) = ∇²g(θ)?

H(θ) = Σ_{i=1}^n π_i(1 - π_i) x_i x_i^T + λI (assuming regularization (λ/2)θ^Tθ).

How well did you know this?

Not at all

Perfectly

What quantities derived from g(θ) are needed to apply Laplace’s approximation to the posterior?

Study These Flashcards

The mode θ̃ (which minimizes g(θ)) and the Hessian evaluated at the mode, H(θ̃).

In frequentist prediction for logistic regression, how is the probability π* for a new observation x* estimated?

Study These Flashcards

Using a plug-in estimate: π* ≈ σ(θ̃^T x*), where θ̃ is the MLE or regularized MLE.

What shape do the contours of predicted probability have in the covariate space for frequentist logistic regression?

Study These Flashcards

Linear (they form parallel hyperplanes).

What is the main drawback of using a single point estimate (like MLE or MAP) for prediction uncertainty?

Study These Flashcards

It doesn’t fully capture uncertainty about θ; predictions might be overconfident, especially far from the training data.

How is Laplace approximation used to approximate the posterior distribution in Bayesian logistic regression?

Study These Flashcards

The posterior π(θ|X, y) is approximated by a multivariate Gaussian distribution N(θ̃, H(θ̃)⁻¹), where θ̃ is the posterior mode and H(θ̃) is the Hessian of the negative log posterior at the mode.

How can we obtain samples from the approximate posterior using Laplace approximation?

Study These Flashcards

By drawing samples θ_i ~ N(θ̃, H(θ̃)⁻¹).

How does visualizing decision boundaries σ(θ^T x*) = 0.5 differ between frequentist and Bayesian (using posterior samples) approaches?

Study These Flashcards

Frequentist shows one boundary based on θ̃. Bayesian shows multiple boundaries, one for each posterior sample θ_i, indicating uncertainty.

How is the posterior predictive probability π* = P(y=1 | x, X, y) calculated in Bayesian logistic regression?

Study These Flashcards

By integrating over the posterior: π* = ∫ σ(θ^T x*) π(θ | X, y) dθ.

How is the posterior predictive probability π* approximated using M posterior samples {θ_i}?

Study These Flashcards

Using Monte Carlo integration: π* ≈ (1/M) Σ_{i=1}^M σ(θ_i^T x*).

What shape do the contours of the Bayesian posterior predictive probability π* have in the covariate space?

Generally non-linear, reflecting the averaging over multiple linear boundaries.

What is the benefit of Bayesian prediction (posterior averaging) over frequentist plug-in prediction regarding uncertainty?

It naturally incorporates parameter uncertainty, leading to less confident predictions (probabilities closer to 0.5) in regions far from the data where different plausible models disagree.

How can posterior predictive checks be used to assess the fit of the Bayesian logistic regression model (using Laplace approximation)?

Simulate replicate datasets ỹ^(j) using parameters θ_j drawn from the Laplace approximation N(θ̃, H(θ̃)⁻¹). Compare the distribution of a chosen summary statistic (e.g., proportion of successes) from the simulations {ỹ^(j)} to the statistic calculated from the observed data y.

What class of models includes logistic regression and probit regression?

Generalized Linear Models (GLMs).

week 7 Flashcards

(28 cards)