Week 1 Flashcards
Name the 3 learners inputs
Domain Set, Label Set, Training Data
Define Domain set
An arbitrary set of objects we wish to label
Define Label Set
Assign number to label
Define Training Data
Input that the learner has access to
Measure of success
error of a classifier, the probability that it does not predict the correct label
Define overfitting
When hypothesis fits the training data too well
Inductive bias
bias toward a particular set of predictors
Confidence parameter
probability of getting nonrepresentative sample
Accuracy parameter
quality of prediction
ERM
Empirical Risk Minimuzation
PAC
Probability Approximately Correct
K-NN
K Nearest Neighbor
k-NN rule summary
assumption that things that look alike must be alike
Why do we need parameters for linkage based clustering algorithms
if kept going, it would eventually result in trivial clustering in one large cluster
Name the two parameters of clustering needed
Measure or define the disance between clusters, and determine when to stop merging