NLP Flashcards
General methods to create Word embeddings?
- One Hot Encoding
- Bag of Words
- N-Grams
- Tf-Idf
- Integer Encoding
OHE : Advantages & Disadvantages
Advantages - Easy & Intuitive
Disadvantages -
1. Sparse vectors
2. Out of Vocabulary words
3. No semantic meaning
4. Fixed Size
BOW : Advantages & Disadvantages
Advantages - Easy & Intuitive
Disadvantages -
1. Sparse vectors
2. Out of Vocabulary words
3. No semantic meaning
N-grams : Advantages & Disadvantages
Advantages - Easy & Intuitive; Captures semantic meaning to an extent
Disadvantages -
1. Out of Vocabulary words
2. Computationally expensive
What is Tf-Idf
Term Frequency-Inverse Document Frequency
TF measures the frequency of a word in a document.
TF(t,d)= Totalnumberoftermsindocumentd/ Numberoftimestermtappearsindocumentd
IDF measures how often a term appears in a document wrt whole corpus.
IDF(t)=log(Numberofdocumentscontainingtermt/
Totalnumberofdocuments)
Why do we use Log in IDF?
This is because for words that has occurred in very few documents, their IDF value will be too high. And, therefore contribution of TF value will be neglected.
Advance methods to create Word embeddings?
- Word2Vec
- GloVe
- fastText
- Transformers
What are Word Embeddings?
Vector representation of words.
Embeddings captures the semantic meaning of the information.
Ex - Embedding of the sentence “today is a sunny day” and “the weather is nice today” will be similar.
What is Word2Vec & GloVe?
Words that appear in similar contexts have similar embeddings.
However, these technique will struggle when a word has multiple meanings, because Glove and Word2Vec have fixed representation of a word.
Word2Vec
By Google
Trained on a very large corpus by training a shallow neural network.
1. Predict the surrounding words given the center word. - Skip-Gram
2. Predict the center word given the surrounding words. - CBOW
GloVe
By Stanford
Trained by looking at the co-occurrence matrix of words (how often words appear together within a certain distance) and then using that matrix to obtain the embeddings.
fastText
By Facebook
Embeddings using Transformers
Transformers learn embeddings in the context of their task.
For example, BERT learns word embeddings in the context of masked language modeling (predicting which word to fill in the blank) and next sentence prediction (whether sentence B follows sentence A).
CLS token
Classification Token - Represents the entire sentence
SEP token
Separator Token - Separates sentences
Word embedding for a word that is split in more than one token
It can be found out by using pooling strategy in which we obtain the embedding of each token and then average them to obtain the word embedding.
What are Sentence Embeddings?
Vector representation of a sentence.
Sentence embeddings are inherently compressions of the information in a sequence of text (words) and compressions are inherently lossy. This implies that sentence embeddings are representations with a lower level of granularity.
Three ways: -
1. CLS Pooling
2. Max Pooling
3. Mean Pooling
CLS Pooling
Embedding corresponding to the [CLS] token.
CLS token capture the idea of the entire sentence
Used when the transformer model has been fine-tuned on a specific downstream task that makes the [CLS] token very useful.