[NLP] Lecture 3: Efficiency, Efficiency, Efficiency (Max Müller-Eberstein) Flashcards

Question 1

Q

What are the three pillars of efficiency?

Answer

A

-Compute: is expensive in electricity, in Denmark we are blessed with a lot of green energy

Data (efficiency)
Effort (how difficult is it to get started with NLP fx?)

Question 2

Q

How many times more power does a chatgpt question take compared to a google search?

Answer

A

10 times more

Question 3

Q

How many % do we use in denmark on data centres?

Answer

A

Almost 20%

Question 4

Q

Where does the model live?

Answer

A

In the cloud, which lives in some hardwere. We use GPUs to train, if we have a really big model, we use more GPUs and we can also use more servers.

Question 5

Q

Why does it work to split the model into servers?

Answer

A

When we use transformer models, we use blocks, and we can add blocks to another server, wait for the one block to finish and send data to the next sever with a block.

Question 6

Q

Explain what a transformer is made of

Answer

A

We feeed some word-ids (word vectors) into the transformer) and this output some vector with some word probability.

Vectors -> attention head -> feedforward

In the attention, what goes on it multiplying the vector with the query, key and value vectors. These words doesnt really mean anything, we do not know what goes on

Question 7

Q

WHat is considered a small LLM model?

Answer

A

32 blocks

Question 8

Q

Exp

Question 9

Q

Why is the transformer not efficient?

Answer

A

If you want to change something with the model, you need to change the whole model, because we dont know what does what

Question 10

Q

How can we make it more efficient?

Answer

A

Only train some parts of the network. The problem is, if we tweak something we dont know how it will effect other things.

We can also do what is called parameter-efficient fune-tuning: adding new parameters to the model: adapter (provide a new hidden state?), prefix (pretends there is more context to the sentence), tuning and LoRA (like an adapter but at a different location)

Question 11

Q

What do we do now

Question 12

Q

Explain how an adapter works, how prefix tuning works and how LoRA works

Question 13

Q

What ways can we make efficient use of the data we have?

Answer

A

Gather as much data as your possible target language
Cross-lingual Transfer:
## Train on all data that you can find and hope for the best
Learning dynmaics:
Understand how and when models learn certain things
Also cross-lingual transfer

Question 14

Q

Why can it be not correct to say smaller language low-resource?

Answer

A

Smaller language communities are under-resourced rather than low-resource.

Question 15

Q

Say how compute, data and effort are the three pillers:

Answer

A

Compute: making the compute go down with fx PEFT

data:
Quality > Quantity
Leverage Transfer Learning
Understand Learning Dynamics

effort: Less Effort on Model/Feature Engineering
More Effort on Software/Hardware Engineering
The Important Parts Still Require The Most Effort

Question 16

Q

Answer

Study These Flashcards

A

[NLP] Lecture 3: Efficiency, Efficiency, Efficiency (Max Müller-Eberstein) Flashcards

(16 cards)