Lecture 7 and 8 Flashcards

Question

TF-IDF

Answer 1

TF-IDF stands for Term Frequency-Inverse Document Frequency. It is a numerical statistic used in information retrieval and text mining to evaluate the importance of a word in a document relative to a collection of documents (corpus). The TF-IDF value increases proportionally to the number of times a word appears in a document but is offset by the frequency of the word in the corpus. This helps to highlight words that are more specific to a particular document and are less common in the entire corpus.

Answer 2

This measures how often a term (word) appears in a document. It is calculated as the ratio of the number of times the term appears in the document to the total number of terms in the document. TF is usually normalized to prevent it from biasing towards longer documents.

Answer 3

This measures how important a term is within the entire corpus. It is calculated as the logarithm of the ratio of the total number of documents to the number of documents containing the term. Terms that occur in many documents have a lower IDF, and vice versa.

Answer 4

popular technique in natural language processing (NLP) and machine learning that is used to represent words as vectors in a continuous vector space. Developed by a team at Google led by Tomas Mikolov, Word2Vec is designed to capture semantic relationships between words based on their context in a given corpus.

Answer 5

This model predicts a target word based on its context. It takes a context of surrounding words (the "bag of words") as input and tries to predict the target word.

Answer 6

In contrast to CBOW, the Skip-Gram model predicts the context words (surrounding words) given a target word. It takes a target word as input and tries to predict the words that are likely to appear in its context.

Answer 7

"Global Vectors for Word Representation," is another popular word embedding technique in natural language processing (NLP). Developed by researchers at Stanford University, GloVe is designed to capture the global context of words in a corpus and create vector representations that encode semantic relationships between words

Answer 8

open-source, free, lightweight library developed by Facebook's AI Research (FAIR) lab for efficient learning of word representations and text classification. It is an extension of the Word2Vec model, developed by the same team. What sets fastText apart is its ability to represent each word as a bag of character n-grams, enabling it to capture morphological information and handle out-of-vocabulary words more effectively

Answer 9

"Embeddings from Language Models," is a deep contextualized word representation model developed by researchers at the Allen Institute for Artificial Intelligence (AI2). Unlike traditional word embeddings that assign a fixed vector to each word regardless of its context, ELMo produces word representations that are sensitive to the surrounding words in a given sentence. ELMo captures the context-dependent meaning of words by considering their usage in different contexts

Answer 10

Bidirectional Encoder Representations from Transformers, is a natural language processing (NLP) model that utilizes an attention mechanism. The attention mechanism is a key component in BERT and many other transformer-based models. Here's an explanation of the attention mechanism and its role in BERT

Answer 11

Attention Mechanism: The attention mechanism is a mechanism that allows a model to focus on different parts of the input sequence when making predictions. In the context of NLP, this input sequence is often a sequence of words in a sentence. Traditional sequence-to-sequence models or recurrent neural networks (RNNs) process input sequences sequentially, but attention mechanisms enable models to consider all words in the sequence simultaneously

Answer 12

XLNet is a generalized autoregressive pretraining method for language understanding. It is a language model that learns unsupervised representations of text sequences. XLNet is an extension of Transformer-XL and uses an autoregressive method to denoise the input and achieve better performance on various tasks. It is capable of modeling bidirectional contexts, which is why it outperforms pretraining approaches based on autoregressive language modeling like BERT¹. XLNet has been shown to outperform BERT on 20 tasks, including question answering, natural language inference, sentiment analysis, and document ranking¹.

Answer 13

Two vectors pointing in the same direction have a cosine similarity of 1; two orthogonal vectors (90 degree angle) have a cosine similarity of 0

Answer 14

Cosine distance can be expressed in difference ways, e.g., 1 – sim

Answer 15

Cosine similarity can be computed as the normalized dot product of the two vectors – very efficient in Python

Lecture 7 and 8 Flashcards

Neural Networks, Word Vectors (40 cards)