FB_InferSent Flashcards

1
Q

Goals

A
  1. use universal sentence representations as features on a wide range of transfer learning tasks
  2. how to use Natural Language Inference (NLI) to train a sentence encoder for universal embeddings
  3. investigate the network architecture to use for the sentence encoder
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

SNLI dataset

A

Stanford Natural Language Inference (SNLI) dataset:

  • 570k human-generated English sentences pairs
  • manually labeled as 1 of 3 categories: entailement, contradiction, neutral
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Model training methods

A
  1. sentence based encoding model that separates the encoding of individual sentences (premise, hyphothesis)
  2. model that does use encodings of both sentences for cross-features and attention between them
    Method 1 is selected in this paper
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Vector representation

A

Uses 3 matching methods for the 2 sentence encoding vectors u, v:

  1. concatenation (u, v)
  2. element-wise product u * v
  3. absolute value of element-wise difference | u - v |
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Model layout

A
The model that uses the vector representation is
- a 3-class classifier with multiple fully connected layers with a softmax output layer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Models - network architectures

A
  • standard LSTM, GRU
  • BiLSTM with mean/max-pooling
  • self-attentive BiLSTM
  • hierarchical ConvNey
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

BiLSTM with mean/max-pooling vector

A
  • over T time-steps a sentence is represented by the hidden states (hs) in direct and reverse reading; from the concatenation of the hs, for each time-step either the max value or the mean value is selected as the dimension of the final vector representation of the sentence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Training process

A
  • SGD with learning rate: 0.1, weight decay: 0.99, mini batches: 64
  • for decreased dev accuracy over an epoch the learning rate is divided by 5
  • classifier: multi-layer perceptron with 1 hidden-layer with 512 hidden units
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Evaluation for transfer learning

A
  • used the sentence embeddings in evaluation of 12 transfer tasks: binary and multiclass classification, entailment and semantic relatedness, phrase detection, caption-image retrieval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Selected sentence encoder model

A
  • BiLSTM with max-pooling with 4096 embedding size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

NLI task suitability hypothesis

A
  • NLI is a task based on high-level understanding of semantic relationships within sentences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly