Generative AI Flashcards
1
Q
Encoder/Decoder Models
A
-
Decode-Only Architecture
- Definition: A model that only uses the decoder part of the transformer architecture.
- Characteristics:
- Processes data in a causal or autoregressive manner, meaning it generates outputs one token at a time, based on previously generated tokens.
- Focused on text generation tasks like story writing, chatbots, or language translation.
- Example: GPT (Generative Pre-trained Transformer).
- GPT generates text by predicting the next token in a sequence based on past tokens.
- Use Cases:
- Text completion.
- Dialogue systems.
- Creative writing.
-
Encode/Decode (Full Transformer) Architecture
- Definition: A model that uses both an encoder and a decoder, which work in tandem to process input and generate output.
- Encoder:
- Processes the input sequence and transforms it into a meaningful internal representation.
- Handles bidirectional context (i.e., looks at the entire sequence to understand the input fully).
- Decoder:
- Generates the output sequence, typically one token at a time, using the encoder’s output as context.
- Example: T5 (Text-to-Text Transfer Transformer) and standard Transformer models.
- Use Cases:
- Translation (e.g., converting English to French).
- Summarization (e.g., condensing a long document into a short summary).
- Question-answering (e.g., responding to queries based on provided context).
2
Q
Dense Model
A
A dense model in the context of machine learning refers to a model where all the parameters (e.g., weights, biases, activations) are actively utilized during computation for every input token or data point. Dense models are the traditional architecture for neural networks and are contrasted with sparse models, where only a subset of parameters is used per computation step.
3
Q
A