Acronyms Flashcards
Learn each of the main acronyms in AI
What is NER?
Named Entity Recognition
- Extracts predefined, general-purpose entities like people, places, organizations, dates, and other standard categories, FROM TEXT.
What is NLP?
Natural Language Processing
What are the main subsets of AI?
AI, Machine Learning, Deep Learning, Generative AI
What is an LLM?
Large Language Model (ChatGPT, for example)
What does non-deterministic mean?
The generated text may be different for every user that uses the same prompt.
What is Amazon Titan?
High-performing Foundational Model (FM) from Amazon. Offers Image, Text, and Milti-modal models
What is Fine-Tuning a Model?
Adapting a copy of a foundation model with your own data. Must use “Provisioned Throughput”.
What is Instruction-based Fine Tuning?
Improve performance of pre-trained FM on doamin-specific tasks. Uses “Labeled Examples” that are “Prompt-Response” pairs. It’s usually cheaper as computations are less intense and amount of data is less.
What is domain adaptation fine-tuning?
Tunes model to be an expert in a specific domain (eg. entire AWS documentation)
What is Single-Turn Messaging?
Part of Instruction-based fine tuning. System(optional): context for the conversation, Messages: An array of message objects each containing: 1) Role: User or Assistant, and 2) Content: The text content of the message.
What is Multi-term Messaging?
Instruction-based fine tuning for for a conversation.
What is Transfer Learning?
The broader concept of reusing a pre-trained model to adapt it to a new related task. Fine-tuning is an example of Transfer Learning.
What does Automatic Evaluation provide when evaluating a model?
Quality Control, Text Summarization, Q&A, Text classification, open-ended text generation. Scores are calculated automatically.
What are the automated metrics to evaluate an FM?
ROUGE: Recall-Oriented Understudy for Gist Evaluation
BLEU: Bilingual Evaluation Understudy
BERTScore: Bidirectional Encoder Representations from Transformers
Perplexity: How well the model predicts the next token (lower is better)
What is ROUGE?
Recall-Oriented Understudy for Gisting Evaluation
- Evaluates automatic summarization and machine translation systems.
- ROUGE-N: number of matching n-grams between reference and generated text
- ROUGE-L: Longest common subsequence between reference and generated text.
What is BLEU?
Bilingual Evaluation Understudy
- Evaluates the quality of generated text, especially for translations.
- Looks at a combination of n-grams (1, 2, 3, 4)
What is BERTScore?
Bidirectional Encoder Representations from Transformers
- Semantic similarities between generated text
Capable of detecting more nuances between the texts
What is Perplexity?
Measures how well the model predicts the next token.
What is RAG?
Retrieval-Augmented Generation
- allows a FM to reference a data source outside of training data
- Bedrocks handles creating the Vector Embeddings in a DB of your choice
- Use where real time data is need to be fed in the FM
What are the valid types of RAG Vector databases?
AWS OpenSearch
AWS DocumentDB (Nosql)
Amazon Aurora (relational)
AWS RDS for PostgresSQL (relational)
Amazon Neptune (graph db)
What are the main RAG datasources?
S3, Confluence, Sharepoint, Salesforce, Web pages
What is Tokenization?
Converting raw text into a sequence of tokens.
What is a “Context Window??
The number of tokens an LLM can consider when generating text. FIRST FACTOR to evaluate when considering a model.
What are Embeddings?
Creation of Vectors(array of numeric values) out of text, images, and audio.
- Can capture lots of dimensions
- Embedding models can power search applications
- Words with a semantic relationship have similar embeddings.