Supervised Learning Flashcards

1
Q

Agent is given IO pairs and learns function that maps from I to O

A

Supervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Input output pairs

A

Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

One row of input data is a

A

Feature vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Outputs of each feature vector are

A

Target labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data fed to the agent

A

Training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Process of learning

A

Training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Given that everything else is equal, choose the less complex hypothesis

A

Occam’s / ockham’s razor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Downside of overgeneralizing

A

Function will group parts of data that are not actually related

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Downside of overfitting

A

Function will only be correct on TRAINING DATA and not on new data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Document representation scheme that countd frequency of words in a set of documents

A

Bag of words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Number of unique worfs across all samples

A

Dictionary size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which is better? Determining by considering the EXACT EMAIL or by COMPUTING probability of words in an email?

A

The latter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Accomodating ONLY THE WORDS in the TRAINING DATA

A

Overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Introduces k fake data to prevent overfitting

A

Laplace smoothing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Used to find the best value for the smoothing factor by PARTITONING training data

A

Cross validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How many percent of the data set will be used for training the agent

A

80

17
Q

How many percent will be used to compute the value of the SMOOTHING FACTOR k

A

10

18
Q

How many percent sill be used to TEST if the parameters are correct and check the accuracy of the agent

A

10

19
Q

Examples of advanced spam filters

A
Knowing the IP Address of sender
Past exchanges of email with sender
Is the email all caps
Do the URL point to where they say
Is the body and other details consistent
20
Q

Spam filtering problem has a ___ number of target labels

A

Discrete

21
Q

Fits a curve to a certain degree to a given set of TRAINING DATA

A

Regression

22
Q

Fits a line to a given training data

A

Linear regression

23
Q

Formula for linear regression

A

f(x) = w1X + w0

24
Q

The linear regression formula aims to minimize the

A

Loss function

25
Q

Computes the RESIDUAL ERROR after fitting the linear function

A

Loss function

26
Q

Algorithm taht uses linear functons for classifications

A

Perceptron algorithm

27
Q

Earliesy model of the human neuron

A

Perceptron

28
Q

Perceptron algorithm aims to find a ___ that will divide inouts to classes

A

Separator

29
Q

Methods whose parameters are finite and independent of the training size

A

Parametric methods

30
Q

Methods whose pArameters grow sd the number of training data increases

A

Nonparametric methods

31
Q

Why is k nearest ne9ghbors algorithm tedious if data is large?

A

Complexity of the search also increases

32
Q

One of the problems that can be solved using supervised learning wherein an agent analyzes example SPAM emails and HAM emails

A

Spam filtering

33
Q

Number of occurrences of Ham messages in the data set over the total number of messages

A

P(Ham)

34
Q

Probability that the email occurs in the Spam data set

A

P(email|Spam)

35
Q

A case of over fitting in spam filtering

A

Naive bayes classification

36
Q

Formula for w0

A

w0 = 1/N(sum of y) - w1/N(sum of x)

37
Q

Formula of w1

A

w1 = N(sum of xy) - (sum of x)(sum of y) / N(sum of x^2) - (sum of x)^2