Machine Learning Midterm Flashcards by Nikolay Lisichenok

What is True Error?

The error between the hypothesis and the target concept regarding a distribution. The hypothesis and target concept may disagree on some instances in the distribution

How well did you know this?

Not at all

Perfectly

PAC Learning - Learner definition

Our learner is what outputs an approximately correct hypothesis.
Is a consistent learner if outputs hypotheses that perfectly fit the training data.

How well did you know this?

Not at all

Perfectly

PAC - Probably definition

Our Learner will probably (i.e., probability) produce an approximately correct learner

How well did you know this?

Not at all

Perfectly

PAC - Approximately Correct definition

Error on the distribution with a small epsilon (i.e., noise/distribution/etc.)

How well did you know this?

Not at all

Perfectly

PAC - Hypotheses parameter

If the number of hypotheses is finite we can bound the number of training examples needed for the concept to be PAC learnable by our Learner

How well did you know this?

Not at all

Perfectly

What is Version Space?

Set of hypotheses in H that perfectly fit the training data.

How well did you know this?

Not at all

Perfectly

How is a Version Space Epsilon Exhausted?

If every hypothesis in the Version Space has true error less than epslion for a set of training examples

How well did you know this?

Not at all

Perfectly

What is PAC Learning?

The probability (1 - parameter1) the version space is epsilon exhausted (i.e., a consistent learner will produce a hypothesis with error on the distribution less than equal to epsilon on the training set)
Gives us the minimum training samples needed to PAC-learn  a concept.

How well did you know this?

Not at all

Perfectly

What is a VC Dimension?

For a given instance space X and hypothesis space H, it is the largest subset of X that can be shattered by H

How well did you know this?

Not at all

Perfectly

What is Shattering?

A set of instances S from X is shattered by H if and only if for every possible dichotomy of S, there exists a hypothesis h from H that is consistent with the dichotomy.
S is shattered by H if there are enough hypotheses in H to agree with every possible labeling of S

How well did you know this?

Not at all

Perfectly

VC Dimension and PAC Learning relationship

If VC(H) is finite, then we can bound the number of training examples needed for epsilon exhaustion the version space of H.
A concept class is only PAC learnable if and only if VC(H) is finite

How well did you know this?

Not at all

Perfectly

What does Bayes’ Rule help with?

Integrate prior information with our data to come up with new information that we can use to confirm our suspicions

How well did you know this?

Not at all

Perfectly

What is Posterior Probability?

The conditional probability that is assigned to a hypothesis after relevant evidence is taken into account

How well did you know this?

Not at all

Perfectly

What is the Bayes Rule formula?

P(h|D) = P(D|h) * P(h) / P(D)

h represents a hypothesis

How well did you know this?

Not at all

Perfectly

What is Bayes Rule?

We find the maximum probability hypothesis given the data across all hypotheses (i.e., Maximum a Posteriori)

How well did you know this?

Not at all

Perfectly

What is the Maximum Likliehood?

Study These Flashcards

Comes from our assumption that all P(h)’s are equivalent, so wan can compute this for a hypothesis

What is Occam’s Razor?

Study These Flashcards

States that among competing hypotheses, the one with the fewest assumptions should be selected

How do we go with finding the shortest hypothesis?

Study These Flashcards

Find the Minimal Description Length

How to find the most probable hypothesis?

Study These Flashcards

Use MAP

How to find the most probable classifier?

Study These Flashcards

Average all the hypotheses of their probable answers

What is a con for Bayes Optimal Classifier?

Study These Flashcards

It takes a long time to compute because the posterior needs to be computed for every hypothesis

Why Naive Bayes classifier?

Study These Flashcards

It is computationally efficent that Bayesian Optimal Classifier

What is a Belief Network?

Study These Flashcards

A graph with nodes and edges that show the probability distribution of a set of variables in terms of their conditional dependencies

How to compute Joint Probabilities?

Study These Flashcards

Can compute them from the Belief Network by using all the probabilities of the parents of a Node and each Node

How does a Naive Bayes Classifier work?

What to find the most probable target value of the classification variable from the average of all the attribute childs.

What is the benefit of Naive Bayes

Computation advantage. Small number of terms that must be estimated (i.e., number of attributes multiplied by the number of distance classification values)

What is a problem with Joint Probability

Suffers from Cusre of Dimensionality

What is a con of Naive Bayes?

Assumption of strong conditional independence. Can't predict XOR

Machine Learning Midterm Flashcards

(28 cards)