Natural Language Processing Flashcards
What is Natural Language Processing (NLP)?
A subfield of [linguistics, computer science, information engineering and AI] concerned with the interactions between computers and human languages.
What are some goals of Natural Language Processing?
- Improve human-computer communication
- Allow people to program computers in natural language
- Distill knowledge from texts
Who developed the ELIZA program?
Joseph Weizenbaum in 1966
What was ELIZA originally meant to be a parody of?
A Rogerian psychoanalyst
How does ELIZA generate responses?
- Scans input sentences for keywords
- Analyzes input according to transformation rules
- Generates responses based on reassembly rules
True or False: ELIZA has a deep understanding of the conversation.
False
What kind of responses does ELIZA use?
Stock answers based on previously mentioned keywords
What is an example of a transformation rule in ELIZA?
Predefined responses are triggered by a sentence containing keywords (e.g., ‘I feel’)
What was the impact of ELIZA?
- One of the best-known AI chatbots
- Claimed by some to have passed the Turing Test
What is PARRY?
A program developed in 1972 to imitate a paranoid schizophrenic
How did PARRY improve upon ELIZA?
It addressed the lack of internal world tracking in ELIZA
What is the main function of SHRDLU?
A natural language interface to a block world allowing users to perform tasks and answer questions
What language was first able to solve simple high school math problems posed in natural language?
STUDENT assumed that every sentence is an equation, used trigger words to identify the parts of the equation:
* „is“ → equates two entities
* „per“ → divides two entities
(Bobrow 1964)
Why is Natural Language Processing considered hard?
- Ambiguity at lexical, syntactic and semantic levels (e.g., hidden meanings, jokes, puns)
What are traditional NLP tasks?
- Word Segmentation
- Part-of-Speech tagging
- Syntactic Analysis
- Semantic Analysis
What does lexical ambiguity refer to?
The same word may have different meanings
What is syntactic ambiguity?
The same sentence may have different interpretations
What is semantic ambiguity?
The interpretation of a sentence may depend on its context
What is the purpose of Word Segmentation in NLP?
It divides the input text into small semantic entities (tokenisation)
What does Part-of-Speech (POS) tagging involve?
Assigning words to roles in a sentence.
i.e., classification, based on probability
What is Syntactic Analysis in NLP?
Finding the most probable grammatical interpretation of a sentence
Define Semantic Analysis.
Finding the most probable meaning of a sentence (and its words)
True or False: Traditional NLP tasks are treated independently.
False.
They depend on each other.
What is the main issue with tokenization?
Determining how to segment words and phrases correctly
Examples: possessives, hyphenated words
What are typical formats for numbers in NLP?
Different formats:
* 3/12/91
* 55 B.C.
* 100.2.86.144
Require special recognisers for dates, IP addresses, etc
What classification is performed in POS tagging?
Sorting of words in a sentence into grammatical categories: noun, verb, adjective, etc.
What models can be used for POS tagging?
- Hidden Markov Models (HMMs)
- Conditional Random Fields (CRFs)
What is the structure of a simple grammar for syntactic analysis?
1) S → NP VP
2) NP → Det N | Det N PP
3) VP → V NP | VP PP
4) PP → P NP
These grammar rules serve as a fundamental framework for syntactic analysis in NLP, enabling machines to understand the structure of human language and facilitating various applications such as machine translation, information extraction, and question answering systems.
What is an example of ambiguity in parsing?
The sentence: ‘Time flies like an arrow, fruit flies like a banana.’
What are Semantic Embeddings?
Representations of an utterance* in an Euclidean space where distance captures similarity in meaning.
(*) represent a complete thought or piece of information, may consist of one or multiple words
What is the traditional approach to semantic analysis?
Bag-of-Words model
What major development occurred in machine translation in the 2010s?
End-to-end deep learning for machine translation
What is a document within the Bag-of-Words model?
A document is regarded as a vector in an n-dimensional space, each word representing one dimension.
What is the purpose of Text Categorization?
Assigning labels to each document based on topics or genres.
Fill in the blank: The _______ model assumes that a document has been generated by repeatedly drawing one word out of a bag of words.
Bag-of-Words
What is the goal of Probabilistic Text Classification within the BOW model?
To determine from which bag (class) a given document was generated.
What does Bayes’ Theorem help to estimate?
The probability of a class given a document.
What assumption does the Naïve Bayes Classifier make?
It assumes independence among tokens.
What is an N-gram model?
Uses N-1 words of prior context to estimate the probability of the next word.
What is GPT-3?
A language model trained by OpenAI with 175 billion parameters.
Generative Pre-Trained Transformer (GPT)
What is the key idea behind Transformer Networks?
They cross-correlate all input elements with each other.
What does Machine Translation aim to achieve?
Translate text from one language to another.
What is ChatGPT?
A chatbot introduced in 2022 by OpenAI that can perform various tasks
ChatGPT utilizes advanced natural language processing techniques.
How does ChatGPT work?
It uses natural language processing and transformer networks to analyze input
Transformer networks allow for cross-correlation of input elements.
What is the key idea behind transformer networks?
Cross-correlate all elements of the input with each other
Examples include BERT and GPT.
What type of responses does ChatGPT generate?
Plausible, but not always correct responses.
Users should verify information provided by ChatGPT.
What were the first versions of ChatGPT primarily based on?
Text-based interactions
Subsequent research aims to integrate different modalities.
What types of modalities are being integrated into ChatGPT?
Images, video, spoken language, music
Google Gemini is an example of a project incorporating these features.
How can ChatGPT’s knowledge be enhanced?
By integrating explicit symbolic knowledge
This can involve interfacing with linked open data or using neuro-symbolic AI.
What are some limitations of ChatGPT?
Fails at simple tasks like counting
Solutions include interfacing with external tools like Wolfram for math.
Can ChatGPT play chess?
Yes, it can play chess using algebraic notation
ChatGPT has been trained on chess strategies and rules.
Does ChatGPT “understand” chess openings?
Yes, it responds to the move 1. e4 with 1…c5, playing the Sicilian Defense
What would happen in the player performs an illegal move?
In some instances, ChatGPT would respond to the move, but not recognize it as illegal.
What does Checkmate signify?
A successful end to the chess game.
What fundamental problems need to be addressed in NLP?
Ambiguity across all levels of natural language (words, syntax, semantics, etc.)
What is the intuition behind the vector-space model?
Representing semantic similarities by casting words as vectors in a multi-dimensional space
What is the Bag of Words Model?
A text representation that describes the occurrence of words in a document (disregarding grammar and word order)
How does probabilistic text categorization work?
It uses probability to classify text into predefined categories
This approach often employs machine learning techniques.
What are semantic embeddings?
Representations of words in a continuous vector space that capture semantic meanings
They are learned through various models like Word2Vec.
What are large language models (LLMs)?
Statistical models that predict the next word in a sentence
They are essential for tasks like text generation.
What is end-to-end learning?
A machine learning approach where the model learns to map inputs directly to outputs
This method simplifies the modeling process by eliminating intermediate steps.