18-Semi supervised learning Flashcards

Question 1

Q

What is the basis behind active learning?

Answer

A

Learner has access to unlabelled data and requests label from an “oracle”

Question 2

Q

What is query synthesis?

Answer

A

Generates (constructs) a new instance based on existing data. Synthetic data
Query an instance
Add it as labelled data

Question 3

Q

What is stream-based sampling?

Answer

A

Looking at sequential data. Doesn’t use synthetic data. One instance at a time

Observe an instance
Flag certain samples for query
Add it as labelled data

Question 4

Q

What is pool-based sampling?

Answer

A

Review all data
Select best instances from entire pool of data to be queried
Add it as labelled data

Question 5

Q

What is the problem with query synthesis?

Answer

A

Human annotator might not recognise the pseudo instance

Question 6

Q

What are active learning approaches?

Answer

A

Query synthesis
Stream based sampling
Pool based sampling

Question 7

Q

What are query strategies?

Answer

A

Uncertainty sampling:
- Least confident
- Margin sampling
Query by committee:
- Vote entropy

Question 8

Q

What are applications of uncertainty sampling?

Answer

A

Speech recognition
Machine translation
Text classification
Word segmentation

Question 9

Q

What is the least confident, uncertainty sampling method?

Answer

A

Query instances where the classifier is least confident of the classification, where x* = argmin_x(P_theta(y^|x))
and y^ = argmin_y(P_theta(y|x))

Question 10

Q

What is the margin sampling, uncertainty sampling method?

Answer

A

x = argmin_x(P_theta(y_1|x)-P_theta(y_2|x))

where y_1 is most likely and y_2 is second most likely

Question 11

Q

What is the query by committee strategy?

Answer

A

Use multiple classifiers to predict on unlabelled data and select the instances with the highest disagreement.

Disagreement is measured with vote entropy: x = argmax_x(-sum(v(y_i)/N)*log2(v(y_i)/N))

Question 12

Q

What is semi-supervised learning?

Answer

A

Learning that utilises a combination of labelled and unlabelled data

Question 13

Q

What is the simple approach to semi-supervised learning?

Answer

A

Combine a supervised and unsupervised learning model. e.g k-means and assign label to most populous class

Question 14

Q

What is the difference between active and semi-supervised learning?

Answer

A

In semi-supervised learning, the algorithm automatically generates new labels, whereas in active learning, the algorithm selects unlabelled data and makes queries

Question 15

Q

What is the main assumption of self-training?

Answer

A

Similar instances are likely to have the same label

Question 16

Q

What is the main assumption of active learning?

Answer

Study These Flashcards

A

Instances near class boundaries are most informative for learning

18-Semi supervised learning Flashcards

(16 cards)