NLP Flashcards
What is TRUE for Natural Language Processing?
improves human-computer communication = TRUE
improves human-human communication = TRUE
improves computer-computer communication = FALSE
distills knowledge from texts TRUE
ELIZA
- was meant to be a parody on a Rogerian psychoanalyst
- programmed by J. Weizenbaum (1966)
- works by a very simple pattern matching on responses
- has no understanding of the conversation
- still one of the best-known AI chatbots
- Weizenbaum’s intention WAS NOT to demonstrate AI
How does ELIZA work?
- Puts sentences into keywords
- Keywords have rank/precedence numbers
- Commas/Periods delimiters?
- Analyzes input acc. to transformation rules by decomposing sentences
- keyword + transformation rules = script
- Response is generated by reassemly rules ass. to decomposition rules (e.g., replacing I with YOU)
If ELIZA gets stuck, what does it do?
- returns to keywords from prio convo
- has a few stock answers
- is sensitive to specific subjects (like family)
Cross three properties which are true about ELIZA!
- can only respond with a question about the current input = FALSE
- scans input sentence for keywords = TRUE
- is not able to return to previous content =FALSE
- uses stock answers sometimes = TRUE
- uses artificial neurons, i.e. a neuronal net = FALSE
- can be sensitive to specific subjects (e.g. subject family) = TRUE
PARRY
counter-part to ELIZA
imitates a paranoid schizophrenic
takes advantages of giving silly responses
it passed a restricted Turing TEST in 1970
STUDENT
- was able to solve simple High School math problems
- but you had to type your question in a natural language
- viewed every sentence as an equation
- used trigger words to identify the task
SHRDLU
- natural language interface to the block’s world
- could perform tasks, give name to objects, memorize operations & answer questions about the state of the world.
Match the chatbots:
SHRDLU = NL interface to the block’s world
PARRY = passed a restricted Turing test
ELIZA = responses are based on reassembly rules
STUDENT = could solve high school math problems
Tokenization is a big problem in NLP.
TRUE
Word Segmentation/Tokenization
- dividing the input text into small semantic entities
- issues are z.b. New York vs. “New” and “York” one token or two tokens?
- numbers have different formats
- also different abrbreviation rules (US vs U.S.)
Part-Of-Speech-Tagging (POS)
- Each word in a sentence can be assigned to a category
- noun, verb, adjective etc.
- based on the definition of the words (thesaurus)
- POS uses definitions (thesaurus) and context (grammar rules) to decide the category
- can be formulated as a sequence learning task such as Hidden Markov Models (HMMs) or Conditional Random Fields (CRFs)
What is NOT a fundamental NLP task?
lexical analysis = TRUE
syntactic analysis
semantic analysis
word segmentation
Modern NLP systems use knowledge based MT systems instead of Deep Learning.
False
The key idea behind Word2Vec is that any word - represented as a vector will be mapped to
familiar words.
True
Which of the following is advantage of using a system like Word2Vec?
capable of capturing syntactic and semantic relationships between different words = FALSE
can have out-of-vocabulary words as well = TRUE
effort for humans to tag data was less, because it is a unsupervised technique = TRUE
vector size is not direct proportional to vocabular size = TRUE
Machine Translation
- has been studied early on
- very hard because of the ambiguity
- Neural Machine Translation allows end-to-end learning
- input is the raw INPUT TEXT
- Encoder = a neural network maps the input into the intermediate representation (–> embedding)
- Decoder = another neural network maps the embedding into a different language text
- Results are much better than previous system, but still not perfect
Transformer Networks
BERT = Convolution
GPT = Recurrence
NLP
= maps the query to a structured representation of the content
3 out of 4 points are reasons why Natural Language Processing (NLP) is hard. Which point does not match the others?
The grammar is very complex for NLP = TRUE
Natural languages are semantically and in a discourse very ambiguous.
There are often hidden meanings that are not obvious from the message itself. (jokes, puns, sarcasm, …)
Natural languages are lexically and syntactically highly ambiguous.
Neural Machine Translation allows end-to-end learning. In which order?
- Intermediate representation (embedding)
input of the raw input text (not grammatical analysis) - Encoder: a neural network maps the input into an
- Decoder: another neural network maps the embedding into a different langua- ge text
- Results typically better than previous systems (but still not perfect)
The basic idea of ”Information Retrieval” is: a document is regarded as a vector in an n- dimensional space and is a linear combination of the base vectors. Linear algebra can be used for various computations.
True
The definition (after Grishman 1997, Eikvil 1999) of ’Information Extraction’ is: ”The identification and extraction of instances of a particular class of events or relationships in a natural language text and their transformation into a structured representation (e.g. a database).”
True
Natural Language Processing (NLP) distill knowledge from texts in different ways. Ther are ”information retrieval” (IR) and ”information extraction” (IE)
a) IR retrieves relevant documents from collections (e.g., search engines).
b) IE retrieves relevant information from documents (e.g., comparison shoppers).
True