Supervised Learning Flashcards
Agent is given IO pairs and learns function that maps from I to O
Supervised learning
Input output pairs
Data
One row of input data is a
Feature vector
Outputs of each feature vector are
Target labels
Data fed to the agent
Training data
Process of learning
Training
Given that everything else is equal, choose the less complex hypothesis
Occam’s / ockham’s razor
Downside of overgeneralizing
Function will group parts of data that are not actually related
Downside of overfitting
Function will only be correct on TRAINING DATA and not on new data
Document representation scheme that countd frequency of words in a set of documents
Bag of words
Number of unique worfs across all samples
Dictionary size
Which is better? Determining by considering the EXACT EMAIL or by COMPUTING probability of words in an email?
The latter
Accomodating ONLY THE WORDS in the TRAINING DATA
Overfitting
Introduces k fake data to prevent overfitting
Laplace smoothing
Used to find the best value for the smoothing factor by PARTITONING training data
Cross validation
How many percent of the data set will be used for training the agent
80
How many percent will be used to compute the value of the SMOOTHING FACTOR k
10
How many percent sill be used to TEST if the parameters are correct and check the accuracy of the agent
10
Examples of advanced spam filters
Knowing the IP Address of sender Past exchanges of email with sender Is the email all caps Do the URL point to where they say Is the body and other details consistent
Spam filtering problem has a ___ number of target labels
Discrete
Fits a curve to a certain degree to a given set of TRAINING DATA
Regression
Fits a line to a given training data
Linear regression
Formula for linear regression
f(x) = w1X + w0
The linear regression formula aims to minimize the
Loss function