P3 - Linguistic Nuances & Natural Language Processing Flashcards

Question 1

Q

What is Natural Language Processing (NLP)?

Answer

A

Definition: The ability of computers to understand and process human language.
Example: Chatbots, voice assistants, and translation tools.
NLP enables chatbots to interpret user input and generate meaningful responses.

Question 2

Q

Why do Linguistic Nuances Matter?

Answer

A

Challenges: Ambiguity, emotion, slang, and context.
Real-world issues: Misunderstanding user intent or tone (e.g., sarcasm).

Question 3

Q

The Five Stages of NLP

Answer

A

Lexical Analysis
Syntactic Analysis (Parsing)
Semantic Analysis
Discourse Integration
Pragmatic Analysis

Question 4

Q

What is lexical analysis?

Answer

A

Breaking text into words and identifying their roles.
Ex: Sentence: “I’ve just been in a car accident.” - Break down the sentence into words

Question 5

Q

What is syntactic analysis?

Answer

A

Understanding the structure of sentences (e.g., subject, verb, object).
Ex: Sentence: “I’ve just been in a car accident.”
- Identify grammar and relationships (e.g., subject = “I”).

Question 6

Q

What is semantic analysis?

Answer

A

Interpreting the meaning of words and sentences.
Ex: Sentence: “I’ve just been in a car accident.”
- Determine the meaning (making a claim)

Question 7

Q

What is discourse integration?

Answer

A

Understanding context within a conversation.
Ex: Sentence: “I’ve just been in a car accident.”
- Relate to previous chatbot interactions (is user a customer?)

Question 8

Q

What is pragmatic analysis?

Answer

A

Considering cultural, social, and legal contexts.
Ex: Sentence: “I’ve just been in a car accident.”
- Context (are you the driver? are you upset? legal situation?)

Question 9

Q

What is a core challenge in lexical analysis regarding punctuation and whitespace?

Answer

A

Punctuation and whitespace may or may not be treated as separate tokens. This decision affects how the next stage - syntactic analysis - interprets sentence structure.

Question 10

Q

How can hyphenated words, contractions, emoticons, and URLs complicate tokenisation?

Answer

A

Different tokenisation strategies might split these constructs in various ways. For instance, hyphenated words or contractions may be split into multiple tokens or kept intact, leading to potential inconsistencies in how the input is later processed.

Question 11

Q

Why is tokenisation especially challenging in languages like Chinese or agglutinative languages?

Answer

A

Languages written in a continuous script (like Chinese) lack clear word boundaries, making it difficult to define a “word.” Similarly, agglutinative or fusional languages (such as Korean or Spanish) have complex word formations—like varied verb conjugations or suffixes—that challenge standard tokenisation rules.

Question 12

Q

What problems do user typos and proper nouns introduce in lexical analysis?

Answer

A

Typos can distort the intended meaning of words, while proper nouns that include spaces, apostrophes, or hyphens may be incorrectly segmented, resulting in tokenisation errors that affect subsequent analysis stages.

Question 13

Q

What are some strategies to address tokenisation challenges in lexical analysis?

Answer

A

Context-Aware Tokenisation
Complex Heuristics
Single-Character Tokenisation

Question 14

Q

How does Context-Aware Tokenisation address tokenisation challenges in lexical analysis?

Answer

A

Using language models (e.g., the Charformer model by Tay et al.) that learn from context to split tokens effectively.

Question 15

Q

How does Complex Heuristics address tokenisation challenges in lexical analysis?

Answer

A

Employing techniques like finite state machines (e.g., using capital letters as cues for proper nouns) despite their limitations.

Question 16

Q

How does Single-Character Tokenisation address tokenisation challenges in lexical analysis?

Answer

A

For numbers, tokenising each character separately may improve mathematical processing, though it demands more processing power.

Question 17

Q

What are common syntactic analysis failures as seen in real-world chatbot applications?

Answer

A

Failures often arise from misinterpreting sentence structure. For example, small punctuation errors or ambiguous phrasing can cause chatbots to jumble words or misassign meaning—resulting in orders being misinterpreted, as seen in documented drive-thru chatbot mishaps.

Question 18

Q

How can syntactic analysis issues be mitigated?

Answer

A

Improvements can be achieved by:
- Training on diverse voice data and incorporating a wider range of tonal variations.
- Using autocorrection and grammar correction algorithms to pre-process text.
- Designing systems with restricted inputs (e.g., a fixed menu) where syntactic variability is reduced.

Question 19

Q

What semantic analysis challenges impact natural language understanding?

Answer

A

Idioms and Non-Literal Language: Phrases like “break a leg” that cannot be understood literally.
Homonyms: Words like “bass” (referring to a fish or a musical tone) require context to disambiguate.
Ambiguous Company Names: Terms like “Apple” or “Target” that have common as well as corporate meanings.

Question 20

Q

What additional semantic challenges are highlighted in linguistic analysis?

Answer

A

Lexical Disambiguation: Determining the correct meaning of a word with multiple definitions based on sentence context.
Lexical Reversibles: Sentences where object positions may be interchanged yet retain meaning.
Neologisms: The need to interpret newly coined or unfamiliar words.
Idiomatic Usage and Metaphors: Understanding non-literal language relies on common cultural knowledge rather than direct interpretation.

Question 21

Q

What constitutes pragmatic analysis failures in language processing?

Answer

A

Pragmatic failures occur when systems misunderstand context, tone, or cultural nuances. This can lead to misinterpretations in customer support or other real-world scenarios, where the intended meaning (or sentiment) of a message is lost or misconstrued.

Question 22

Q