CONVERSATIONAL AI LLMs Flashcards
4 challenging factors in speech recognition
- Variability in speech: accent, dialect, background noise
- Domain-specific lexicon
- Contextual understanding
- Speech disfluency: self-corrections, repetitions
What are important attributes of speech synthesis
Natural, emotional, and real-time
Task-oriented conversations vs open domain conversations
Task-oriented
-clear goal
-limited knowledge
-short conversation
-predictable
open domain
-no exact goal
-unrestricted
-long-session
-unpredictable
-emotions and empathy
We have to pass the current question but also the context and rank the best response
How do we train LLMs
Train the base model and then
Use reinforcement learning from human feedback RLFH
What are challenges of conversational AI
- consistent conversations (stick with facts and persona)
- empathic agents for user experience
- terrible at detecting misinformation
- using large contexts; often middle context is not helpful
What are Hallucinations
When an AI generates something that is not true
What are Intrinsic hallucinations
Response is not grounded in the context
obviously wrong
What are Extrinsic hallucinations
Incorporates incorrect information
Harder to recognise
How can we improve generated responses
Few-shot prompting (in-context learning)
providing examples in context
Roleplay - adopting a role/function
Prompt engineering (chain of thought)
Ask model to generate the reasoning step to their answer, allows it to perform better
Finetuning
Trained on huge dataset and fine-tuned on smaller relevant data
But hard to introduce new knowledge
Restricting context length: RAG
Retrieval augmented generation
Take query and only retrieve relevant context
computationally longer
RAG vs Fine tune
Start with prompt engineering
if want to improve short term memory -> RAG
if want to improve long term memory (persona or language it uses) -> Fine tune
These methods are all additive - can use all
How to evaluate generative LLMs
- factuality of answer (faithfulness)
- relevancy to question
What are Foundation Models
Models that are capable of a range of general tasks
And then we can build things on top of it
What do LLMs learn
Formal competence
- knowledge of linguistic rules and patterns;
- producing reasonable utterances
Functional competence
- understanding and using language in the world
- showing the understanding?
LLMs master formal well but functional less so
Prompting vs fine tuning
Fine tuning
Using a foundational model and fine tuning it on specific data - the model changes
Prompting
Can involve only instruction or +context or +input text
What is prompt engineering
the practice of developing and optimizing prompts to efficiently use LLMs for a variety of applications
What is in-context learning
Prompting language model performs a task just by conditioning on input-output examples (changing prompts), without optimising any parameters
1) pertaining
2) warming up (optional eg fine tune)
3) scoring - putting together input and most feasible answer
What is Warming up
Optional fine tuning for a prompting model
- Supervised in-context training: fine-tune on a a broad range of tasks
- Self-supervised in-context training: use the frozen PLM to generate some synthetic training data and fine tune on it
Model becomes better trained on ICL
But requires updating weights
What is zero-shot learning
No example of expected behaviour from the model
Prompting tips
- Be specific, clear, and concise.
- Include context.
- Iterate: subsequent prompts are typically needed
- Use as domain-specific instructions as possible
- Roleplay
Prompting problems
- Can introduce bias (using context)
- Requires domain expertise ; eg specific medical field
- still lags behind SotA model tuning results
- sub-optimal and sensitive
How can we reduce Hallcuinations
Training it to say “I dont know”
Evidence-based response eg RAG: model has a “reality check” against real-world data
Context retrieval provides ‘evidence’: enables factual consistency
What is self reflective RAG
RAG and then Validate the response to see if it is coming from the right context
How to evaluate Retrieval LLMs
- signal to noise ration of retrieved context (context precision)
- did it retrieve all relevant information (context recall)