Week 1 Flashcards
Wat zijn de 5 onderdelen van een leerprobleem?
1) Input x in X.
2) Output y in Y.
3) Doelfunctie f: X -> Y.
4) De dataset met (x1, y1)…
5) De geleerde hypothese g: X -> Y.
Wanneer is ML toepasbaar op een probleem? Geef 3 voorwaarden.
1) Er is een patroon.
2) Het lukt niet om dit patroon te beschrijven door het probleem te analyseren.
3) Er zijn gegevens waar we uit kunnen leren.
What is H in ML?
The hypothesis set of candidate formulas that are under consideration.
What is the role of h(x)?
A functional form that assigns the weights to the different components of the input vector.
When is a dataset linearly separable?
There is a choice for the parameters that classifies all the training examples correctly.
PLA
perception learning algorithm
What is the goal of the perceptron learning algorithm?
Finding a hypothesis that classifies all the data points in data set D correctly.
Supervised learning setting
When the training data contains explicit examples of what the correct output should be for given inputs.
Waar bestaat de hypotheseruimte uit bij k-nearest neighbors?
Bijna alle functies van inputs naar outputs.
Active learning
The data set is acquired by the learner through asking for a label for specific entries.
What is the standard formula for h(x)?
h(x) = sign(w.T * x)
Online learning
The data set is given to the algorithm one example at a time. Learning takes place as data becomes available.
Transfer learning
When training an algorithm on data results in a model, and that model is used on a new problem or task. It uses the info learned on the first problem to improve on the second one.
What is the update formula for w?
w(t+1) = w(t) + y(t)*x(t)
Reinforcement learning
The training example does not contain the target output, but contains some possible output together with a measure of how good that output is.
in-sample error
E.in(h): The error rate within a sample: the fraction of the data set where h and f disagree.
Example: the mistakes on a practice test.
Wat is een pluspunt van het PLA?
Het doorzoekt een oneindig grote verzameling hypothesen.
What kind of data do you need to use the Perceptron Learning Algorithm?
Linearly separable data.
Give the formula for E.in(h):
In-sample error:
= 1/N * the amount of datapoints in the sample where h(x) and f(x) disagree.
What does the out-of-sample error denote?
How accurately the hypothesis function performs on data it hasn’t seen before.
Example: performance on exam.
What is the deterministic answer to ‘Does the data set D tell us anything outside of D that we didn’t know before?’?
No.
D tells us something certain about f outside of D.
What is the probabilistic answer to ‘Does the data set D tell us anything outside of D that we didn’t know before?’?
Yes.
D tells us something likely about f outside of D.
What are the two questions that present the feasability of learning?
1) Can we make sure that Eout(g) is close enough to Ein(g)?
2) Can we make Ein(g) small enough?
What effect does a more complex H have?
It gives more flexibility in finding some g that fits the data well, leading to small Ein(g).
What effect does a complex f have?
We get a worse value for Ein(g).
Noisy function
A function where the output is not uniquely determined by the input.
How does neigbors-based classification learn?
Does not attempt to construct a general or internal model, but simply stores instances of the training data.
Wat is mu in het model met knikkers en vazen?
De proportie rode knikkers in de vaas.
What is the formula for the line that classifies the datapoints?
w1x1 + w2x2 + b = 0
Wat is de onbekende in het knikkers/vazenmodel?
Mu, de proportie rode knikkers in de vaas.
Unsupervised learning
The training data does not contain any output information at all.
Data mining
A practical field that focuses on finding patterns.
What is nu in the marbles-vases model?
The fraction of red marbles within a random sample of N marbles that you pick from the vase.
What does the Hoeffding Inequality denote?
The maximum probability that the sum of the bounded independent random variables is different from its expected value more than a certain amount x.
Example: for a fixed N, a higher precision of the difference between E.in & E.out makes the probability of that difference lower.
Why is the Hoeffding Inequality important for machine learning?
Through the Hoeffding Inequality, learning, generalizing to unknown data, is made possible without knowing the target function.
This is because neither nu nor mu are needed to bound the probability in the H.I.
What is the Hoeffding Inequality used for?
It quantifies the relationship between nu and mu.
Give the Hoeffding Inequality:
P[|nu-mu| > epsilon] <_ 2e ^ (-2*epsilon ^2 *N)
The probability that the difference between the fraction of red marbles in the random sample and the actual fraction of red marbles in the vase is bigger than epsilon,
is smaller than or equal to the right-side.
Epsilon is a positive value we choose, how much nu and mu can be different.
What is always true about E.in(h), E.out(h) and epsilon?
|E.in(h) - E.out(h)| > epsilon
If event B1 implies event B2, thus
B1 -> B2,
then…
The probability of event B1 is smaller or equal to the probability of event B2, thus
P[B1] <_ P[B2]
Describe the nearest-neigbors algorithm:
Given an input, search for the closest input we have seen and copy that output.