Machine Learning Midterm Flashcards
What is True Error?
The error between the hypothesis and the target concept regarding a distribution. The hypothesis and target concept may disagree on some instances in the distribution
PAC Learning - Learner definition
Our learner is what outputs an approximately correct hypothesis.
Is a consistent learner if outputs hypotheses that perfectly fit the training data.
PAC - Probably definition
Our Learner will probably (i.e., probability) produce an approximately correct learner
PAC - Approximately Correct definition
Error on the distribution with a small epsilon (i.e., noise/distribution/etc.)
PAC - Hypotheses parameter
If the number of hypotheses is finite we can bound the number of training examples needed for the concept to be PAC learnable by our Learner
What is Version Space?
Set of hypotheses in H that perfectly fit the training data.
How is a Version Space Epsilon Exhausted?
If every hypothesis in the Version Space has true error less than epslion for a set of training examples
What is PAC Learning?
The probability (1 - parameter1) the version space is epsilon exhausted (i.e., a consistent learner will produce a hypothesis with error on the distribution less than equal to epsilon on the training set) Gives us the minimum training samples needed to PAC-learn a concept.
What is a VC Dimension?
For a given instance space X and hypothesis space H, it is the largest subset of X that can be shattered by H
What is Shattering?
A set of instances S from X is shattered by H if and only if for every possible dichotomy of S, there exists a hypothesis h from H that is consistent with the dichotomy.
S is shattered by H if there are enough hypotheses in H to agree with every possible labeling of S
VC Dimension and PAC Learning relationship
If VC(H) is finite, then we can bound the number of training examples needed for epsilon exhaustion the version space of H. A concept class is only PAC learnable if and only if VC(H) is finite
What does Bayes’ Rule help with?
Integrate prior information with our data to come up with new information that we can use to confirm our suspicions
What is Posterior Probability?
The conditional probability that is assigned to a hypothesis after relevant evidence is taken into account
What is the Bayes Rule formula?
P(h|D) = P(D|h) * P(h) / P(D)
h represents a hypothesis
What is Bayes Rule?
We find the maximum probability hypothesis given the data across all hypotheses (i.e., Maximum a Posteriori)
What is the Maximum Likliehood?
Comes from our assumption that all P(h)’s are equivalent, so wan can compute this for a hypothesis
What is Occam’s Razor?
States that among competing hypotheses, the one with the fewest assumptions should be selected
How do we go with finding the shortest hypothesis?
Find the Minimal Description Length
How to find the most probable hypothesis?
Use MAP
How to find the most probable classifier?
Average all the hypotheses of their probable answers
What is a con for Bayes Optimal Classifier?
It takes a long time to compute because the posterior needs to be computed for every hypothesis
Why Naive Bayes classifier?
It is computationally efficent that Bayesian Optimal Classifier
What is a Belief Network?
A graph with nodes and edges that show the probability distribution of a set of variables in terms of their conditional dependencies
How to compute Joint Probabilities?
Can compute them from the Belief Network by using all the probabilities of the parents of a Node and each Node
How does a Naive Bayes Classifier work?
What to find the most probable target value of the classification variable from the average of all the attribute childs.
What is the benefit of Naive Bayes
Computation advantage. Small number of terms that must be estimated (i.e., number of attributes multiplied by the number of distance classification values)
What is a problem with Joint Probability
Suffers from Cusre of Dimensionality
What is a con of Naive Bayes?
Assumption of strong conditional independence. Can’t predict XOR