Book Flashcards

1
Q

What does LLM stand for?

A

Large Language Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some common applications of LLMs?

A
  • Machine translation
  • Generating novel texts
  • Sentiment analysis
  • Text summarization
  • Content creation (e.g., writing fiction, articles, code)
  • Powering chatbots and virtual assistants
  • Knowledge retrieval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three main stages of building and using LLMs?

A
  • Implementing the LLM architecture and data preparation process
  • Pretraining the LLM to create a foundation model
  • Fine-tuning the foundation model to become a personal assistant or text classifier
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the core architecture used in many modern LLMs?

A

The transformer architecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two submodules of the transformer architecture?

A

Encoder and decoder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the purpose of word embeddings in LLMs?

A

To represent words as continuous-valued vectors that can be processed by the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is tokenization?

A

The process of splitting text into individual units (tokens), which can be words or subword units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is byte pair encoding (BPE)?

A

A tokenization scheme that iteratively merges frequent character pairs into subword units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are training examples sampled for LLM training?

A

Using a sliding window approach to create input-output pairs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the purpose of the attention mechanism in LLMs?

A

To allow the model to weigh the importance of different parts of the input sequence when making predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between causal attention and self-attention?

A

Causal attention only allows the model to attend to past and present tokens, while self-attention allows the model to attend to all tokens in the sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is multi-head attention?

A

An extension of the attention mechanism that allows the model to attend to different parts of the input sequence in parallel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the main components of a transformer block in an LLM?

A
  • Multi-head attention module
  • Feedforward network
  • Layer normalization
  • Shortcut connections
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is pretraining in the context of LLMs?

A

Training the LLM on a large, unlabeled text dataset to develop a general understanding of language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is fine-tuning in the context of LLMs?

A

Further training the pretrained LLM on a smaller, labeled dataset to adapt it to a specific task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two most common types of fine-tuning for LLMs?

A

Instruction fine-tuning and classification fine-tuning

17
Q

What is the purpose of layer normalization in LLMs?

A

To stabilize training and improve performance by normalizing the activations of each layer

18
Q

What is the purpose of shortcut connections in LLMs?

A

To improve training of deep networks by allowing gradients to flow more easily through the layers

19
Q

What is the GELU activation function?

A

A smooth, non-monotonic activation function that has been found to be effective in LLMs

20
Q

What is the purpose of the softmax function in text generation?

A

To convert model outputs (logits) into a probability distribution over the vocabulary

21
Q

What is temperature scaling in text generation?

A

A technique used to control the randomness of the generated text by scaling the logits before applying the softmax function

22
Q

What is top-k sampling in text generation?

A

A technique used to reduce the number of tokens considered during text generation by only selecting from the top-k most likely tokens