Supervised Learning Flashcards
Agent is given IO pairs and learns function that maps from I to O
Supervised learning
Input output pairs
Data
One row of input data is a
Feature vector
Outputs of each feature vector are
Target labels
Data fed to the agent
Training data
Process of learning
Training
Given that everything else is equal, choose the less complex hypothesis
Occam’s / ockham’s razor
Downside of overgeneralizing
Function will group parts of data that are not actually related
Downside of overfitting
Function will only be correct on TRAINING DATA and not on new data
Document representation scheme that countd frequency of words in a set of documents
Bag of words
Number of unique worfs across all samples
Dictionary size
Which is better? Determining by considering the EXACT EMAIL or by COMPUTING probability of words in an email?
The latter
Accomodating ONLY THE WORDS in the TRAINING DATA
Overfitting
Introduces k fake data to prevent overfitting
Laplace smoothing
Used to find the best value for the smoothing factor by PARTITONING training data
Cross validation