LLM Flashcards
1
Q
What is Byte Pair Encoding
A
Byte-Pair Encoding is a tokenization algorithm that builds tokens from characters.
Once you’ve built your token vocabulary with BPE, it can be passed to a neural network for training.
2
Q
What is tokenization
A
Tokenization is essential for building a vocabulary that can be used to train a large language model. In practice, large language models are trained on hundreds of billions of tokens.