LLM Flashcards

1
Q

What is Byte Pair Encoding

A

Byte-Pair Encoding is a tokenization algorithm that builds tokens from characters.

Once you’ve built your token vocabulary with BPE, it can be passed to a neural network for training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is tokenization

A

Tokenization is essential for building a vocabulary that can be used to train a large language model. In practice, large language models are trained on hundreds of billions of tokens.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly