11.2 Semi-supervised learning Flashcards
Which of these methods would most likely require a human annotator to label new instances?
- Active learning
Active learning requires an “oracle” that can label the instances selected by the model; this is often done by a human. The other methods involve using existing labels and don’t require additional labeling.
Which of these is an important assumption of self-training?
- Similar instances are likely to have the same label
The main assumption of self-training is that instances which are nearby in a feature space are likely to have the same label, so labels can be propagated from labelled instances to their unlabelled neighbours.
Which of these describes a “query by committee” strategy?
- Which of these describes a “query by committee” strategy?
Query by committee is an active learning strategy that involves training more than one machine learning algorithm on the same data, and querying the instances where they disagree.
Suppose you are training a neural network to classify images of the digits 0-9, and you wish to use some data augmentation techniques to increase your training set size. Which of these manipulations would you probably not include?
- Rotate the image 180 degrees
180-degree rotation probably wouldn’t be included in a digit classification task because it would change the “6” images into “9”s and vice versa. A classifier trained on this augmented dataset probably wouldn’t be able to distinguish the “6” and “9” classes.