Lecture 6 Flashcards

Question 1

Q

What is Natural Language Generation (NLG)?

Answer

A

NLG is the process of transforming structured data into human-readable text, also known as Data-to-Text.

Question 2

Q

Name three applications of NLG.

Answer

A

Structured Report Generation (e.g., BabyTalk project), Weather Reporting, and Question Answering Systems.

Question 3

Q

What are the main types of NLG systems?

Answer

A

Classical (rule-based), Template-Based, Statistical/Neural (data-driven), and Hybrid Systems.

Question 4

Q

What is a Template-Based NLG System?

Answer

A

A system that uses fixed text templates with slots for variable content, often used in predictable domains.

Question 5

Q

Describe the advantage and disadvantage of Rule-Based NLG Systems.

Answer

A

Advantage: Produces highly accurate text for specific domains. Disadvantage: Requires extensive manual effort to define rules.

Question 6

Q

What are the key components of a Classical NLG System?

Answer

A

Content determination, discourse structuring, aggregation, referring expression generation, lexical choice, realization, and fluency ranking.

Question 7

Q

What is Content Determination in NLG?

Answer

A

The process of deciding what information to include in the generated text.

Question 8

Q

What is Discourse Structuring in NLG?

Answer

A

Organizing the information into a coherent and logical flow within the generated text.

Question 9

Q

Name three Decoding Strategies in NLG.

Answer

A

Greedy Sampling, Beam Search, and Top-K Sampling.

Question 10

Q

What is Top-K Sampling?

Answer

A

A decoding method where only the top
𝑘
k most probable words are considered for generation, adding diversity.

Question 11

Q

How does Temperature Sampling work in NLG?

Answer

A

Adjusts the probability distribution with a temperature parameter to control creativity and focus; lower values make generation more focused.

Question 12

Q

What types of data are commonly used to train LLMs?

Answer

A

Web text from Common Crawl, Colossal Clean Crawled Corpus (C4), Wikipedia, news sites, and patents.

Question 13

Q

Why is Data Quality important in NLG?

Answer

A

Low-quality data can introduce biases, toxicity, and unsafe content in the generated text.

Question 14

Q

What is Prompting in the context of LLMs?

Answer

A

Using an input prompt to guide a language model to generate relevant text, sometimes known as in-context learning.

Question 15

Q

What is the difference between Zero-Shot and Few-Shot Prompting?

Answer

A

Zero-shot prompting includes no examples, while few-shot prompting includes labeled examples to improve model performance.

Question 16

Q

What is Instruction Tuning?

Answer

Study These Flashcards

A

A finetuning phase where the model is trained on tasks framed as instructions, enhancing its ability to handle new tasks.

Question 17

Q

What are Intrinsic Evaluation Metrics?

Answer

Study These Flashcards

A

Metrics that measure model quality directly, focusing on fluency, coherence, and grammaticality of generated text.

Question 18

Q

What is BLEU?

Answer

Study These Flashcards

A

A reference-based metric used to evaluate text similarity, especially in machine translation.

Question 19

Q

What does BERTScore evaluate in generated text?

Answer

Study These Flashcards

A

It considers semantics to evaluate paraphrasing, capturing distant dependencies and word order.

Question 20

Q

When is MAUVE used in evaluation?

Answer

Study These Flashcards

A

For open-ended tasks, MAUVE measures how similar the distributions of human-written and model-generated texts are.

Question 21

Q

What is the purpose of Human Evaluation in NLG?

Answer

Study These Flashcards

A

To assess generated text based on fluency, relevance, coherence, diversity, and usefulness in task-specific contexts.

Lecture 6 Flashcards

(21 cards)