18-Semi supervised learning Flashcards

1
Q

What is the basis behind active learning?

A

Learner has access to unlabelled data and requests label from an “oracle”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is query synthesis?

A

Generates (constructs) a new instance based on existing data. Synthetic data
Query an instance
Add it as labelled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is stream-based sampling?

A

Looking at sequential data. Doesn’t use synthetic data. One instance at a time

Observe an instance
Flag certain samples for query
Add it as labelled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is pool-based sampling?

A

Review all data
Select best instances from entire pool of data to be queried
Add it as labelled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the problem with query synthesis?

A

Human annotator might not recognise the pseudo instance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are active learning approaches?

A

Query synthesis
Stream based sampling
Pool based sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are query strategies?

A

Uncertainty sampling:
- Least confident
- Margin sampling
Query by committee:
- Vote entropy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are applications of uncertainty sampling?

A

Speech recognition
Machine translation
Text classification
Word segmentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the least confident, uncertainty sampling method?

A

Query instances where the classifier is least confident of the classification, where x* = argmin_x(P_theta(y^|x))
and y^ = argmin_y(P_theta(y|x))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the margin sampling, uncertainty sampling method?

A

x = argmin_x(P_theta(y_1|x)-P_theta(y_2|x))

where y_1 is most likely and y_2 is second most likely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the query by committee strategy?

A

Use multiple classifiers to predict on unlabelled data and select the instances with the highest disagreement.

Disagreement is measured with vote entropy: x = argmax_x(-sum(v(y_i)/N)*log2(v(y_i)/N))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is semi-supervised learning?

A

Learning that utilises a combination of labelled and unlabelled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the simple approach to semi-supervised learning?

A

Combine a supervised and unsupervised learning model. e.g k-means and assign label to most populous class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between active and semi-supervised learning?

A

In semi-supervised learning, the algorithm automatically generates new labels, whereas in active learning, the algorithm selects unlabelled data and makes queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the main assumption of self-training?

A

Similar instances are likely to have the same label

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the main assumption of active learning?

A

Instances near class boundaries are most informative for learning