Natural Language Processing Flashcards

1
Q

What is Natural Language Processing (NLP)?

A

A subfield of [linguistics, computer science, information engineering and AI] concerned with the interactions between computers and human languages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some goals of Natural Language Processing?

A
  • Improve human-computer communication
  • Allow people to program computers in natural language
  • Distill knowledge from texts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Who developed the ELIZA program?

A

Joseph Weizenbaum in 1966

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What was ELIZA originally meant to be a parody of?

A

A Rogerian psychoanalyst

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does ELIZA generate responses?

A
  • Scans input sentences for keywords
  • Analyzes input according to transformation rules
  • Generates responses based on reassembly rules
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

True or False: ELIZA has a deep understanding of the conversation.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What kind of responses does ELIZA use?

A

Stock answers based on previously mentioned keywords

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an example of a transformation rule in ELIZA?

A

Predefined responses are triggered by a sentence containing keywords (e.g., ‘I feel’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What was the impact of ELIZA?

A
  • One of the best-known AI chatbots
  • Claimed by some to have passed the Turing Test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is PARRY?

A

A program developed in 1972 to imitate a paranoid schizophrenic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How did PARRY improve upon ELIZA?

A

It addressed the lack of internal world tracking in ELIZA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the main function of SHRDLU?

A

A natural language interface to a block world allowing users to perform tasks and answer questions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What language was first able to solve simple high school math problems posed in natural language?

A

STUDENT assumed that every sentence is an equation, used trigger words to identify the parts of the equation:
* „is“ → equates two entities
* „per“ → divides two entities

(Bobrow 1964)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is Natural Language Processing considered hard?

A
  • Ambiguity at lexical, syntactic and semantic levels (e.g., hidden meanings, jokes, puns)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are traditional NLP tasks?

A
  1. Word Segmentation
  2. Part-of-Speech tagging
  3. Syntactic Analysis
  4. Semantic Analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does lexical ambiguity refer to?

A

The same word may have different meanings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is syntactic ambiguity?

A

The same sentence may have different interpretations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is semantic ambiguity?

A

The interpretation of a sentence may depend on its context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the purpose of Word Segmentation in NLP?

A

It divides the input text into small semantic entities (tokenisation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does Part-of-Speech (POS) tagging involve?

A

Assigning words to roles in a sentence.

i.e., classification, based on probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is Syntactic Analysis in NLP?

A

Finding the most probable grammatical interpretation of a sentence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Define Semantic Analysis.

A

Finding the most probable meaning of a sentence (and its words)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

True or False: Traditional NLP tasks are treated independently.

A

False.

They depend on each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the main issue with tokenization?

A

Determining how to segment words and phrases correctly

Examples: possessives, hyphenated words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are typical formats for numbers in NLP?

A

Different formats:
* 3/12/91
* 55 B.C.
* 100.2.86.144

Require special recognisers for dates, IP addresses, etc

26
Q

What classification is performed in POS tagging?

A

Sorting of words in a sentence into grammatical categories: noun, verb, adjective, etc.

27
Q

What models can be used for POS tagging?

A
  • Hidden Markov Models (HMMs)
  • Conditional Random Fields (CRFs)
28
Q

What is the structure of a simple grammar for syntactic analysis?

A

1) S → NP VP
2) NP → Det N | Det N PP
3) VP → V NP | VP PP
4) PP → P NP

These grammar rules serve as a fundamental framework for syntactic analysis in NLP, enabling machines to understand the structure of human language and facilitating various applications such as machine translation, information extraction, and question answering systems.

29
Q

What is an example of ambiguity in parsing?

A

The sentence: ‘Time flies like an arrow, fruit flies like a banana.’

30
Q

What are Semantic Embeddings?

A

Representations of an utterance* in an Euclidean space where distance captures similarity in meaning.

(*) represent a complete thought or piece of information, may consist of one or multiple words

31
Q

What is the traditional approach to semantic analysis?

A

Bag-of-Words model

32
Q

What major development occurred in machine translation in the 2010s?

A

End-to-end deep learning for machine translation

33
Q

What is a document within the Bag-of-Words model?

A

A document is regarded as a vector in an n-dimensional space, each word representing one dimension.

34
Q

What is the purpose of Text Categorization?

A

Assigning labels to each document based on topics or genres.

35
Q

Fill in the blank: The _______ model assumes that a document has been generated by repeatedly drawing one word out of a bag of words.

A

Bag-of-Words

36
Q

What is the goal of Probabilistic Text Classification within the BOW model?

A

To determine from which bag (class) a given document was generated.

37
Q

What does Bayes’ Theorem help to estimate?

A

The probability of a class given a document.

38
Q

What assumption does the Naïve Bayes Classifier make?

A

It assumes independence among tokens.

39
Q

What is an N-gram model?

A

Uses N-1 words of prior context to estimate the probability of the next word.

40
Q

What is GPT-3?

A

A language model trained by OpenAI with 175 billion parameters.

Generative Pre-Trained Transformer (GPT)

41
Q

What is the key idea behind Transformer Networks?

A

They cross-correlate all input elements with each other.

42
Q

What does Machine Translation aim to achieve?

A

Translate text from one language to another.

43
Q

What is ChatGPT?

A

A chatbot introduced in 2022 by OpenAI that can perform various tasks

ChatGPT utilizes advanced natural language processing techniques.

44
Q

How does ChatGPT work?

A

It uses natural language processing and transformer networks to analyze input

Transformer networks allow for cross-correlation of input elements.

45
Q

What is the key idea behind transformer networks?

A

Cross-correlate all elements of the input with each other

Examples include BERT and GPT.

46
Q

What type of responses does ChatGPT generate?

A

Plausible, but not always correct responses.

Users should verify information provided by ChatGPT.

47
Q

What were the first versions of ChatGPT primarily based on?

A

Text-based interactions

Subsequent research aims to integrate different modalities.

48
Q

What types of modalities are being integrated into ChatGPT?

A

Images, video, spoken language, music

Google Gemini is an example of a project incorporating these features.

49
Q

How can ChatGPT’s knowledge be enhanced?

A

By integrating explicit symbolic knowledge

This can involve interfacing with linked open data or using neuro-symbolic AI.

50
Q

What are some limitations of ChatGPT?

A

Fails at simple tasks like counting

Solutions include interfacing with external tools like Wolfram for math.

51
Q

Can ChatGPT play chess?

A

Yes, it can play chess using algebraic notation

ChatGPT has been trained on chess strategies and rules.

52
Q

Does ChatGPT “understand” chess openings?

A

Yes, it responds to the move 1. e4 with 1…c5, playing the Sicilian Defense

53
Q

What would happen in the player performs an illegal move?

A

In some instances, ChatGPT would respond to the move, but not recognize it as illegal.

54
Q

What does Checkmate signify?

A

A successful end to the chess game.

55
Q

What fundamental problems need to be addressed in NLP?

A

Ambiguity across all levels of natural language (words, syntax, semantics, etc.)

56
Q

What is the intuition behind the vector-space model?

A

Representing semantic similarities by casting words as vectors in a multi-dimensional space

57
Q

What is the Bag of Words Model?

A

A text representation that describes the occurrence of words in a document (disregarding grammar and word order)

58
Q

How does probabilistic text categorization work?

A

It uses probability to classify text into predefined categories

This approach often employs machine learning techniques.

59
Q

What are semantic embeddings?

A

Representations of words in a continuous vector space that capture semantic meanings

They are learned through various models like Word2Vec.

60
Q

What are large language models (LLMs)?

A

Statistical models that predict the next word in a sentence

They are essential for tasks like text generation.

61
Q

What is end-to-end learning?

A

A machine learning approach where the model learns to map inputs directly to outputs

This method simplifies the modeling process by eliminating intermediate steps.