Machine Learning Midterm Flashcards

1
Q

What is True Error?

A

The error between the hypothesis and the target concept regarding a distribution. The hypothesis and target concept may disagree on some instances in the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

PAC Learning - Learner definition

A

Our learner is what outputs an approximately correct hypothesis.
Is a consistent learner if outputs hypotheses that perfectly fit the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

PAC - Probably definition

A

Our Learner will probably (i.e., probability) produce an approximately correct learner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

PAC - Approximately Correct definition

A

Error on the distribution with a small epsilon (i.e., noise/distribution/etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

PAC - Hypotheses parameter

A

If the number of hypotheses is finite we can bound the number of training examples needed for the concept to be PAC learnable by our Learner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Version Space?

A

Set of hypotheses in H that perfectly fit the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is a Version Space Epsilon Exhausted?

A

If every hypothesis in the Version Space has true error less than epslion for a set of training examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is PAC Learning?

A
The probability (1 - parameter1) the version space is epsilon exhausted (i.e., a consistent learner will produce a hypothesis with error on the distribution less than equal to epsilon on the training set)
Gives us the minimum training samples needed to PAC-learn  a concept.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a VC Dimension?

A

For a given instance space X and hypothesis space H, it is the largest subset of X that can be shattered by H

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Shattering?

A

A set of instances S from X is shattered by H if and only if for every possible dichotomy of S, there exists a hypothesis h from H that is consistent with the dichotomy.
S is shattered by H if there are enough hypotheses in H to agree with every possible labeling of S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

VC Dimension and PAC Learning relationship

A
If VC(H) is finite, then we can bound the number of training examples needed for epsilon exhaustion the version space of H.
A concept class is only PAC learnable if and only if VC(H) is finite
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does Bayes’ Rule help with?

A

Integrate prior information with our data to come up with new information that we can use to confirm our suspicions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Posterior Probability?

A

The conditional probability that is assigned to a hypothesis after relevant evidence is taken into account

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Bayes Rule formula?

A

P(h|D) = P(D|h) * P(h) / P(D)

h represents a hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Bayes Rule?

A

We find the maximum probability hypothesis given the data across all hypotheses (i.e., Maximum a Posteriori)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the Maximum Likliehood?

A

Comes from our assumption that all P(h)’s are equivalent, so wan can compute this for a hypothesis

17
Q

What is Occam’s Razor?

A

States that among competing hypotheses, the one with the fewest assumptions should be selected

18
Q

How do we go with finding the shortest hypothesis?

A

Find the Minimal Description Length

19
Q

How to find the most probable hypothesis?

A

Use MAP

20
Q

How to find the most probable classifier?

A

Average all the hypotheses of their probable answers

21
Q

What is a con for Bayes Optimal Classifier?

A

It takes a long time to compute because the posterior needs to be computed for every hypothesis

22
Q

Why Naive Bayes classifier?

A

It is computationally efficent that Bayesian Optimal Classifier

23
Q

What is a Belief Network?

A

A graph with nodes and edges that show the probability distribution of a set of variables in terms of their conditional dependencies

24
Q

How to compute Joint Probabilities?

A

Can compute them from the Belief Network by using all the probabilities of the parents of a Node and each Node

25
Q

How does a Naive Bayes Classifier work?

A

What to find the most probable target value of the classification variable from the average of all the attribute childs.

26
Q

What is the benefit of Naive Bayes

A

Computation advantage. Small number of terms that must be estimated (i.e., number of attributes multiplied by the number of distance classification values)

27
Q

What is a problem with Joint Probability

A

Suffers from Cusre of Dimensionality

28
Q

What is a con of Naive Bayes?

A

Assumption of strong conditional independence. Can’t predict XOR