C3 Flashcards

Question 1

Q

what are the two parts of two-part MDL?

Answer

A

first part: use some optimal code to specify the right Turing machine, x

second part: use some optimal code to specify the right content for the tape, y

“right” means that the specified Turing machine and tape content combination outputs the given string

Question 2

Q

3 problems with Kolmogorov complexity

Answer

A

uncomputability (halting problem)
large constants (depends on the universal TM we choose)
Universal TM is extremely expressive, but this leads to serious problems

Question 3

Q

MDL

Answer

A

Minimum Description Length principle:
given a set of models M, the best model m in M is the one that minimises L(M) + L(D|M)
with L(M) the length in bits of the description of M and L(D|M) the length in bits of the description of the data when encoded with M

Question 4

Q

example: describing polynomials

Answer

A

L(M) refers to the coefficients at a certain precision
L(D|M) is the length needed to encode the residuals: the differences between the actual values and the values given by the model

Question 5

Q

model class (statistics)

Answer

A

all polynomials

Question 6

Q

model (statistics)

Answer

A

all polynomials of kth degree

Question 7

Q

point hypothesis

Answer

A

one specific polynomial (with coefficients specified)

Question 8

Q

problem with crude two-part MDL

Answer

A

L(M) needs to be defined
multiple code words for a single dataset exists, which cannot be optimal (two-part code is incomplete)

Question 9

Q

universal coding

Answer

A

associate a code not with a single model, but with all m in the set of models => construct a code L(D|M) that is small whenever there is an m in M that fits the data well

Question 10

Q

refined MDL

Answer

A

uses universal codes (guaranteed to be minimax optimal)
developed for three tasks:
1. model selection
2. point hypothesis selection
3. prediction

Question 11

Q

Kraft’s inequality

Answer

A

for alphabet S, any prefix codee

Question 12

Q

Kraft’s inequality

Answer

A

for alphabet S, any prefix code C must satisfy sum_x in S (s^-L_C(x) <= 1
=> code lengths are probabilites, a probability distribution is a code

Question 13

Q

Normalised Maximum Likelihood (NML) code

Answer

A

example of a universal code: the code given by the equalised distribution of maximum likelihoods
P_nml(D) = P_{tHATa(D)} (D) / sum_{D in D} P_{tHATa(D)} (D)

likelihood distribution P_theta is the probability of any data given parameter(s) theta and the likelihood estimator tHATa (D) = argmax_{theta in theta} P_theta (D)

Question 14

Q

prequential plug-in code

Answer

A

we want to encode messages over some alphabet, but initially we don’t know anything
1. start with some low count and define probabilities accordingly (we don’t like probabilities of 0)
2. transmit symbols one by one and update counts, meanwhile compute probabilities

this converges to the optimal maximum likelihood estimate

Question 15

Q

what does MDL give us?

Answer

A

MDL gives us the best model within the class of models considered (more modest than Kolmogorov complexity, but we can compute it). It does not assume one model to be the “true” model

Question 16

Q

benefits of MDL

Answer

Study These Flashcards

A

no overfitting (Occam’s razor)
non-parametric modelling possible (often desirable in realistic scenarios)
can also be used beyond prediction

C3 Flashcards

(16 cards)