Supervised Learning Flashcards
Agent is given IO pairs and learns function that maps from I to O
Supervised learning
Input output pairs
Data
One row of input data is a
Feature vector
Outputs of each feature vector are
Target labels
Data fed to the agent
Training data
Process of learning
Training
Given that everything else is equal, choose the less complex hypothesis
Occam’s / ockham’s razor
Downside of overgeneralizing
Function will group parts of data that are not actually related
Downside of overfitting
Function will only be correct on TRAINING DATA and not on new data
Document representation scheme that countd frequency of words in a set of documents
Bag of words
Number of unique worfs across all samples
Dictionary size
Which is better? Determining by considering the EXACT EMAIL or by COMPUTING probability of words in an email?
The latter
Accomodating ONLY THE WORDS in the TRAINING DATA
Overfitting
Introduces k fake data to prevent overfitting
Laplace smoothing
Used to find the best value for the smoothing factor by PARTITONING training data
Cross validation
How many percent of the data set will be used for training the agent
80
How many percent will be used to compute the value of the SMOOTHING FACTOR k
10
How many percent sill be used to TEST if the parameters are correct and check the accuracy of the agent
10
Examples of advanced spam filters
Knowing the IP Address of sender Past exchanges of email with sender Is the email all caps Do the URL point to where they say Is the body and other details consistent
Spam filtering problem has a ___ number of target labels
Discrete
Fits a curve to a certain degree to a given set of TRAINING DATA
Regression
Fits a line to a given training data
Linear regression
Formula for linear regression
f(x) = w1X + w0
The linear regression formula aims to minimize the
Loss function
Computes the RESIDUAL ERROR after fitting the linear function
Loss function
Algorithm taht uses linear functons for classifications
Perceptron algorithm
Earliesy model of the human neuron
Perceptron
Perceptron algorithm aims to find a ___ that will divide inouts to classes
Separator
Methods whose parameters are finite and independent of the training size
Parametric methods
Methods whose pArameters grow sd the number of training data increases
Nonparametric methods
Why is k nearest ne9ghbors algorithm tedious if data is large?
Complexity of the search also increases
One of the problems that can be solved using supervised learning wherein an agent analyzes example SPAM emails and HAM emails
Spam filtering
Number of occurrences of Ham messages in the data set over the total number of messages
P(Ham)
Probability that the email occurs in the Spam data set
P(email|Spam)
A case of over fitting in spam filtering
Naive bayes classification
Formula for w0
w0 = 1/N(sum of y) - w1/N(sum of x)
Formula of w1
w1 = N(sum of xy) - (sum of x)(sum of y) / N(sum of x^2) - (sum of x)^2