lesson 2 Flashcards
Describe in 2 sentences how a LLM like ChatGPT generates text
prediction of next token conditional on context, recursively
In the expression P(xn|xn−1, . . . xn−k), identify
(a) the next token, (b) the context and (c) give an expression for the context length.
(a) The next token is represented by xn.
b) The context includes xn−1, xn−2, …, xn−k, contextual information that the model uses to predict xn.
(c) The expression for the context length is k, as indicated by the subscripts in the expression. The context length k represents the number of preceding tokens that are considered when predicting the next token xn.
An english text has a length of 3000 words.
(a) Calculate approximately how many tokens will represent the text.
(b) If the text is translated to Italian, will the number change? how?
considering that Average EN 1.3 tokens/word, a text of 3000 words: 1.3 * 3000 = 3900.
probably yes, because we have considering the avarage of english word as 1.3. the italian is more.
yes, higher number,
2* 3000 = 6000
Approximately which Top-p value will you select to generate . . .
(a) a poem and (b) a technical description
0,9
0,4
State the elements of the P.R.O.P.E.R. framework.
Persona – which role should it take? E.g. professor, helpful assistant, critic, . . .
Request – what task should it should fulfill?
Operation – in which way / using which method?
Presentation – which tone/style/format for the result? E.g. informal, short, table…
Examples – provide a template for the output.
Refinement – give feedback, iterate and imp
Which tasks will ChatGPT 3.5 (free version) be able to fulfill?
(a) Write a summary of the 1400-page bestseller classic “War and Peace” by Leo Tolstoy
(b) Integrate x2dx
(c) Summarize a 10-page PDF document (you have copied and pasted the text)
(d) Create a detailed plan for a friend’s wedding
(e) Summarize the political events of last year
(f) Propose a balanced portfolio of US stocks
no
yes
no
yes
no
yes
What are n-grams?
Direct sequence of n words in a text
What is an LLM?
Large Language Models:
Statistical model to predict the next token, recursively
What is a token?
Important unit of account for LLMs
LLMs can . . . (at least 3 things)
▶ Produce convincing text
▶ Incorporate provided information (in-context learning)
▶ Reproduce standard facts / textbook knowledge
▶ Transform and translate information
▶ Present output in many forms
LLMs cannot . . . (at least 3)
▶ Analyze a problem like humans do
▶ Understand your cultural/implicit context
▶ Know everything (rare facts, news)
▶ Run code, reason logically, web access, symbolic calculations
Procedure of Estimating P()
Procedure:
1 Data: entire Wikipedia, Stackexchange, Github (40TB for GPT4)
2 Tokenization
3 Pretraining: token prediction (masking) ← gradient descent
4 Finetuning: specific datasets or tasks
5 Parameter adjustment: based on evaluation of (3) and (4)
6 Iterate: repeat steps (3) to (5) until model converges
“Top K”
= set of K tokens with largest cumulative probability
▶ “Top p” (nucleus)
= smallest set of tokens with P pi > pthreshold
Context length of ChatGPT 3.5 e 4.0:
▶ ChatGPT 3.5 – 4096 tokens
▶ ChatGPT 4.0 – 8192 tokens