FB_InferSent Flashcards
1
Q
Goals
A
- use universal sentence representations as features on a wide range of transfer learning tasks
- how to use Natural Language Inference (NLI) to train a sentence encoder for universal embeddings
- investigate the network architecture to use for the sentence encoder
2
Q
SNLI dataset
A
Stanford Natural Language Inference (SNLI) dataset:
- 570k human-generated English sentences pairs
- manually labeled as 1 of 3 categories: entailement, contradiction, neutral
3
Q
Model training methods
A
- sentence based encoding model that separates the encoding of individual sentences (premise, hyphothesis)
- model that does use encodings of both sentences for cross-features and attention between them
Method 1 is selected in this paper
4
Q
Vector representation
A
Uses 3 matching methods for the 2 sentence encoding vectors u, v:
- concatenation (u, v)
- element-wise product u * v
- absolute value of element-wise difference | u - v |
5
Q
Model layout
A
The model that uses the vector representation is - a 3-class classifier with multiple fully connected layers with a softmax output layer
6
Q
Models - network architectures
A
- standard LSTM, GRU
- BiLSTM with mean/max-pooling
- self-attentive BiLSTM
- hierarchical ConvNey
7
Q
BiLSTM with mean/max-pooling vector
A
- over T time-steps a sentence is represented by the hidden states (hs) in direct and reverse reading; from the concatenation of the hs, for each time-step either the max value or the mean value is selected as the dimension of the final vector representation of the sentence
8
Q
Training process
A
- SGD with learning rate: 0.1, weight decay: 0.99, mini batches: 64
- for decreased dev accuracy over an epoch the learning rate is divided by 5
- classifier: multi-layer perceptron with 1 hidden-layer with 512 hidden units
9
Q
Evaluation for transfer learning
A
- used the sentence embeddings in evaluation of 12 transfer tasks: binary and multiclass classification, entailment and semantic relatedness, phrase detection, caption-image retrieval
10
Q
Selected sentence encoder model
A
- BiLSTM with max-pooling with 4096 embedding size
11
Q
NLI task suitability hypothesis
A
- NLI is a task based on high-level understanding of semantic relationships within sentences