Module 2: Chapter 10 - Generative Artificial Intelligence Flashcards
In GenAI, what are the five major content modalities?
(1) Text
(2) Image
(3) Audio
(4) Video
(5) Multimodal
What are the disadvantages for the bag of words approaches?
(1) The techniques have nothing to say about the interpretation of a particular word, and therefore an analyst using the bag of words will not be able to identify when two words have a similar meaning, such as “amazing” and “fantastic.”
(2) This process creates vast vectors that are extremely sparse, containing mostly zeros. This is extremely inefficient, and makes the information very difficult to store, let alone analyze.
(3) Each word is treated independently of all other words in a document so that the context in which they occur is lost. This causes issues when a specific word has more than one meaning (e.g., a call option and a phone call).
What is Word2Vec?
Word2Vec is a model for NLP that was created in 2013 by a Google team led by Tomas Mikolov as an alternative to the conventional BoW method for the vector representation of words. With Word2Vec, each word in the vocabulary has its own vector that is constructed by examining all the surrounding words in each document where that word occurs. The model can be used to determine the other words that are most like a particular word by determining the degree of “co-occurrence” of a group of words, i.e., how likely they are to appear together in a sentence.
Word2Vec permits a compressed representation by using an embedding, which employs a neural network to effectively reduce the dimensionality by creating a dense vector representation of words for storage and analysis.
What are the two architectures used for Word2Vec?
(1) The CBoW is a “fill in the blank” technique that proposes a probability-ordered list of words that would fill in the gap between a few words before and after a missing word.
(2) The skip-gram, on the other hand, is like a “word association” technique that uses a particular word to predict the words that surround it within a certain number of places before and after that word.
What is the size of the vectors for Word2Vec?
Typically, the embedding layer includes 300 neurons. This implies that we can measure how close every word is to every other word using a weight space, of dimension |W| × 300, from the input layer to the hidden layer, rather than |W| × |W| for the conventional BoW representation, which implies a considerable reduction in dimensionality. The input layer comprises a set of one hot encoded vectors that take the value 1 in the position reflecting the index of the word and 0 everywhere else, while the output layer contains the probabilities associated with the predictions for the target word (for CBoW) or for the context words (for skip-gram). Estimation of the weights usually takes place via gradient descent method like the one employed for training feedforward neural networks.
How to measure word similarity?
To find word similarities, a measure such as the cosine similarity8 of two words in a word vector space is typically used. Cosine similarity measures the “distance” between words where similar words will be close together and dissimilar words will be far apart in the vector space. The model learns the relationships between words by looking for co-occurrences across sentences. For example, if the word is “risk,” it is likely that the words “loss” and “management” would be close to it, but “university” would probably be much further away. In addition to analyzing how close the words are to each other, the embeddings can also reveal other relationships between words. One can perform quasi-mathematical operations with the word vectors, which capture word associations in different dimensions of word vector space.
Word embeddings are useful because they can capture “pseudo-meanings” through similarities measured by distances between vector representations.
What are the limitations of Word2Vec?
Although word embedding techniques such as Word2Vec represented a major step forward in natural language processing, they are nonetheless limited by their inability to capture sentence-level contexts (where the same word has an entirely different meaning depending on the context in which it is used in a sentence). Word embeddings also ignore more distant words that fall outside of CBoW or skip-gram range, and do not consider word order within the range. Word2Vec’s usefulness is also constrained by its requirement that the context window length is fixed and must be specified a priori by the analyst.
What is a Recurrent Neural Network (RNN)?
An alternative architecture that has been used to capture the relationship between distant words is a recurrent neural network (RNN). In a basic version of RNN, a memory (or, hidden) cell, is present. The hidden cell’s value at time t is a function of the current inputs, as well as its value at time step t − 1. At any time step, the RNN combines the inputs from the current time step as well as the values stored in memory to generate the output.
How does an RNN capture the notion of context?
RNNs do not treat words in a sentence as if they are independent of one another. Rather, RNNs are designed to handle ordered (sequential) data and they operate iteratively (i.e., in a loop) by combining the current inputs with the values stored in the hidden units to generate output for a time period. The output values are stored in the hidden units, which then become part of the output calculations in the next time period. For this reason, RNNs are sometimes known as autoregressive neural networks. They can capture the order of words in a sentence and, potentially, they can also handle dependencies between one word and another word any number of places away in a sentence. RNNs use data sequentially, which means that they process each observation one-by-one. Whereas Word2Vec requires that each vector be of the same length, RNNs can handle variable length vectors.