LLM - Long Contexts, RAG Flashcards

1
Q

What are the limitations of instruction tuning?

A
  1. Difficult to collect diverse labeled data
  2. Rate learning (token by token) —
    ▪ limited creativity
  3. Agnostic to model’s knowledge —
    ▪ may encourage hallucinations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is reinforcement learning?

A

Reinforcement learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment, receiving feedback in the form of rewards or penalties to maximize cumulative rewards over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does reinforcement learning help language models?

A

For the given query, LM provides two options for response and asks the user which one is better (optionally also by how much on a scale 1-5 or something)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can we estimate the reward for RL?

A

First of all, we want to estimate it because we don’t want to use human feedback because it is very costly to ask humans all the time. Alternatively, we can build a model to mimic the user preference.

  1. Collecting user-annotated data:
    - 1. approach would be to get humans to provide absolute scores for each output. Challenge is that human judgments on different instances and by different people can be noisy and mis-calibrated!
    - 2. approach is to ask for pairwise comparison (is A or B better?)
  2. Using this data, we can train a reward model
    ▪ The reward model returns a scalar reward which should numerically represent the human preference
  3. Using this model, we teach the LM agains the rewards that the RL model returns.
  4. Periodically train the reward model with more samples and human feedback.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What would be a problem with training a reward model and training LM using it? How to solve it?

A

LM will learn to produce an output that would get a high reward but might be
gibberish or irrelevant to the prompt.

Solution: add a penalty term that penalizes too much deviations from the distribution of the pre-trained LM. This prevents the policy model from diverging too far from the pretrained model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some consideration when working with Transformer LMs and long inputs?

A

Length generalization: Do Transformers work
accurately on long inputs?

Efficiency considerations: How efficient are LMs with long inputs?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why scaling up LMs for longer context size is not feasible?

A

memory usage and number of operations in Self-Attention increases
quadratically

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Sparse Attention Pattern?

A

It is one of the solutions for the efficiency consideration of LMs.

The idea is to make the attention operation sparse by limiting what tokens can attend to other tokens. Some ideas might include that only tokens nearby are able to attend to the token, or some other, more random patterns are also explored

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some Sparsity Patterns?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are Retrieval-based Language Models

A

They are LMs that retrieve information from an external datastore (at least during inference time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can we solve problem of LMs not having the up-to-date information?

A

By using retrieval-based LMs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why use Retrieval-based LMs?

A

LLMs can’t memorize all (long-tail) knowledge in their parameters

LLMs’ knowledge is easily outdated and hard to update

LLMs’ output is challenging to interpret and verify (add link to the source of info)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a process of RAG?

A
  1. Have an IR part (like neural retrivals to retrieve documents based on the query). By using some nearest neighbour search, we can retrieve top k chunks relevant to answer the query.
  2. Include this either as an input context of the LM or in the middle somewhere but also at the end (decision done by engineers)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Give me two variants how can retrieval-augmented LM look like

A
  1. Retrieve chunks of text (passages) once based on the user query, include it in the input layer and do the rest normally.
  2. Split the query into multiple parts, retrieve for each of those parts some chunks and include the embeddings of those chunks in the decoder part between self-attention and the FFN. (decision made by the engineers)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can RAG be trained?

A
  • end-to-end: train both retriever and LM together
  • Freeze some parts and train the other parts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly