CS4051 Natural Language Processing Flashcards

Question

Explain what POS tagging consists of? Explain the main challenge of POS tagging?

Answer 1

Part-of-speech tagging involves assigning a lexical category to each word in a corpus. POS tagging can then be used to define rules about the syntax of a sentence and use them to generate other sentences of the same structure. Difficulty of POS arises when considering context.

Answer 2

Syntax trees represent how grammar rules combine to form sentences

Answer 3

The task of obtaining semantic structure of a sentence by analyzing the relationship between words. Verbs are often the root of the tree, since they are central to their clause (sentence).

Answer 4

Context-free grammars allow for expansions of productions/rules regardless of the context.

Answer 5

The idea behind language models is to assign probabilities to symbols of a language, based on the product of preceding tokens. Cross-Entropy measures how close the distribution of tokens learned by the model is to the true token distribution for the language learnt.

Answer 6

Word2Vec embeds all senses of a word to the same vector, so the past tense of "see" and the cutting tool "saw" are both embedded the same.

Answer 7

Words with similar meanings should be close in the vector space (for all languages), and syntax/morphology should sometimes be preserved as well.

Answer 8

Cosine similarity between two embeddings, measured as the dot product of the two vectors divided by both their magnitude.

Answer 9

Contextualized embedders such as BERT can embed words without disregarding their context.

Answer 10

TF*IDF (term-frequency, inverse-document-frequency) is a heuristic which given a term/query and a collection of documents, attributes higher weights to documents containing a high term frequency and for which the term is common in the document, but uncommon in all other documents in the corpus.

Answer 11

Direct Transfer consists of training an embedding model on a resource-rich language and apply it to a resource poor language. A common limitation with this approach are out-of-vocab words

Answer 12

The common approach is to learn a monolingual embedding in a language and map them to the target language using a dictionary. A bottleneck arises if there is not entry in the dictionary for a particular token, meaning a mapping cannot be acheived.

Answer 13

BiSkip uses word-aligned and sentence-aligned texts, then runs a SKIP-GRAM model using words from both languages as context. BiVCD merges and shuffles aligned documents, then runs a monolingual embedding on this output.

Answer 14

Turning structured data into human-readable text.

Answer 15

Rule-based generation requires manual effort to define rules. It does not generalize well however it produces high quality outputs for the target domain. Data-driven generation requires no rule-definition and generalises well, however the output quality is not a guarantee and there is a risk of picking up bias from the training data.

Answer 16

Decoding is the task of choosing a word to generate based on the model's probabilities. Sampling is a common decoding approach.

Answer 17

Random sampling : randomly choose a word from the model's probability distribution Greedy sampling : always choose the most probable word Beam Search sampling : expand X sequences and pick the one with the highest cumulative probability. Top-k sampling : choose a word at random from the top-k most probable ones Nucleus sampling : choose a word at random from the top-p percent of the probability mass Temperature sampling : normalize logits by a temperature before applying a soft-max function. As temperature decreases, higher probability words are pushed towards 1 and lower probability ones towards 0.

Answer 18

In-context learning (prompting) is the task of improving the model's performance without using any gradient-descent based updates on its parameters, usually by using prompts to generate a context for the LLM to work in. Few-shot prompting consists including labelled example answers to guide the model.

Answer 19

Fine tuning consists of adapting the parameters of a pre-trained model to complete tasks in a specific domain.

Answer 20

Reference metrics assess the similarity between generated text for a task and the human-written output. Two common metrics are BLEU to evaluate cross-language machine translations and BERTscore which computes token-wise similarity, taking semantics into account.

Answer 21

Reference resolution refers to the task of determining the preceding topic being refereed to in a dialogue context. It relies on correct understanding of speaker common ground and speaker intent

Answer 22

Speech acts convey speaker intent. Examples of these are warnings, requests, and invitations.

Answer 23

Inference to the most plausible explanation

Answer 24

Information Retrieval refers to the task of returning documents from a corpus which are relevant to answering a user query. The first approach is to encode the documents as sparse vectors, weight them using TF-IDF and then computing the most relevant one by cosine similarity. The second approach involves encoding the documents and query as dense vectors using BERT and then computing the similarity within the embedding space

Answer 25

Retrieval Augmented Generation uses information retrieval to gather relevant documents, then feed them to a LLM to generate an answer. The main benefit of RAG is that answers are more likely to be grounded in truth. Question-Answering systems use RAG in the Retriever-Reader architecture.

Answer 26

The phenomenon by which speakers adapt to each other to aid mutual understanding

Answer 27

These are generalized speech acts which also represent grounding and may require inference by one of the actors in the dialogue

Answer 28

Annotate each utterance/turn with the speaker and concatenate these together to generate a dialogue context.

Answer 29

The main challenge in dialogue is exploiting common ground to understand context in order to generate an utterance/response.

Answer 30

A measure of a speaker's processing effort. Surprisal has been observed to converge between speakers as the dialogue progresses.

Answer 31

Constructions are utterances of 3+ tokens which occur 3+ times within the dialogue. Constructions repetition facilitates processing in task-oriented dialogues, especially since repetition facilitates information delivery rate between speakers.

Answer 32

Dialogue length

Answer 33

Machine translation is the process of using computational techniques to translate text or speech from one language to another. Common challenges include interpreting creativity and style (especially when translating creating pieces such as poetry) and translating from/to low resource languages

Answer 34

A possible bias can occur when translating from a gender-neutral language such as Hungarian to a gendered language such as English. For example, "she is a nurse" and "he is a doctor".

Answer 35

Two common metrics are Adequacy (how well the translation captures the meaning of the source sentence) and Fluency (how fluent the translation is in the target language)

Answer 36

These measure the similarity between the translation output and a human-generated gold standard by calculating the overlap between the two. BLEU is a scoring metric which captures lexical relatedness and penalizes translations which are too short. BERTscore captures semantic relatedness. It works by encoding both translations using BERT and computing the token-wise cosine similarity.

Answer 37

Summarisation aims to produce a condensed version of a document which contains information relevant to answering a user query. The three stages of a summarisation task are (1) pre-processing - parse document and extract required contents (2) feature design and sentence scoring - extract features such as length and use them to score sentences (3) post-processing - sentence simplification, redundancy removal, and reordering.

Answer 38

Extractive summarisation produces summaries consisting entirely of material copied from the input document Abstractive summarisation consists of material not in the input document i.e., paraphrased

Answer 39

ROUGE is regarded as a convenient metric to use when human evaluation is not possible. The core idea is to compute the percentage of bi-grams contained in the human-defined gold-standard which also appear in the generated summarisation.

Answer 40

Temporality is defined as knowledge change over time

Answer 41

Omission occurs when LLMs provide an incomplete or partly correct answer for a query with multiple correct answers. Fabrication occurs when LLMs provide a made-up answer for queries with no answer. Misattribution occurs when LLMs answer with the wrong entity for a query whose answer is a proper noun.

Answer 42

In 1-hop, the timestamp of the answer is explicitly included in the query, while in 2-hop this is implicit.

Answer 43

Exact match yields true if the ground truth is contained within the model answer. F1 measures overlap between the LLM's answer and the ground truth. PEDANTS measure the correlation of model output with a human-labelled QA dataset

Answer 44

1. The query doesn't support order swapping - ground truth expects "x ... y" therefore "y ... x" is not allowed. 2. Model answer is correct but verbose 3. Model is likely guilty of omission of information from the answer 4. Model is likely guilt of fabricating the answer

Answer 45

Possible data shortage about what happened a long time ago, there could be some gaps in the data or different formats of data may have been gathered. In general, increased data availability results in better answers.

Answer 46

RAG - retrieval augmented generation

Answer 47

Reasons for intrinsic evaluation are assessing performance, assessing generalization, identifying bias and fairness. Common methods for intrinsic evaluation can be using performance metrics (BLUE, perplexity, etc.), using dataset splits, and using human evaluation such as labeling

Answer 48

Reasons for extrinsic evaluation include ensuring ethical and safe deployment, assessing levels of trust and UX quality, and ensuring real-word applicability. Common success measures include user satisfaction score, task outcome, and participant behavior during the interaction with the model.

Answer 49

Data bias is embedded in the training data, Model bias is developed or amplified during learning, and Evaluation bias is usually induced by human annotators or biased evaluation metrics.

Answer 50

Attaching a Datasheet to a dataset outlines its motivation, composition, and collection process.

Answer 51

Humans may have different agreements on what is considered "harmful" for example, hence if a group of annotators is under sensitive, then the data will result contain this preference/bias. Cohen's Kappa is a statistical measure to quantify the agreement level between two annotators. A k >= 0.44 is acceptable.

Answer 52

Developing LLMs require a lot of power and resources, hence it can be harmful to the environment and require a lot of budget. A popular technique is to use the student-teacher training architecture, where a smaller model (the student) distills knowledge from the larger model (the teacher) to train itself.

Answer 53

interpretability is defined as the degree to which a human can consistently predict a model's output/decision process. Interpretability leads to increased model transparency. KNNs and Decision Trees are known for their interpretability.

Answer 54

Neural networks and ensemble models are typically hard to interpret. Local interpretation techniques such as local surrogate models (LIME) can help make these more interpretable by replacing some internal components with interpretable models. For example Random Forests with Decision Trees (known for their interpretability)

Answer 55

Attribution techniques allow us to investigate what aspects of the context lead to a specific output, often achieved by weighting the influence of each token in the input on the output.

CS4051 Natural Language Processing Flashcards

(79 cards)