Task 1 Flashcards

Question 1

Q

part-of-speech tagging

Answer

A

Part-of-speech tagging (POS-tagging or simply tagging) is the process of classifying words into their parts of speech and labeling them accordingly. Part-of-speech also known as word classes or lexical categories. The collection of tags used for a particular task is known as a tagset.

Question 2

Q

word sense disambiguation

Answer

A

Word sense disambiguation (WSD) is the process of determining which “sense” (meaning) of a word is activated by the use of the word in a particular context, a process which appears to be largely unconscious in people.

Sample of word sense disambiguation:
Sentence:
“Time flies like an arrow. Fruit flies like a banana” - Groucho Marx

Question 3

Q

lexical disambiguation

Answer

A

Lexical disambiguation - disambiguation of the sense of a polysemantic word. Lexical ambiguity occurs when a word has more than one meaning.

Sample of lexical disambiguation:
“How can student deposit money into a bank?”

Human knows that the bank here refers to a financial institution. Whereas, given a question

“Who is seating on the bank of the river?”

The bank here refers to the sloping land beside the river. But unfortunately, it is very difficult for computers to do the same job. Having more than one meaning for an individual word would lead to matching irrelevant answers and that will decrease the accuracy of retrieving the answers.

Question 4

Q

syntactic disambiguation

Answer

A

Syntactic disambiguation is the process to resolve ambiguity by picking most probable parse tree. Syntactic ambiguity happen when a sentence may be interpreted in more than one way due to ambiguous sentence structure.

Sample of syntactic disambiguation:
“Flying planes can be dangerous.”
Either the act of flying planes is dangerous, or planes that are flying are dangerous.

“Stolen painting found by tree.”
Either a tree found a stolen painting, or a stolen painting was found sitting next to a tree.

Question 5

Q

probabilistic parsing

Answer

A

Probabilistic parsing is the process to use knowledge of language gained from hand-parsed sentences to try to produce the most likely analysis of new sentences. Probabilistic parsing can be done using Probabilistic Context Free Grammar (PCFGs). PCFGs is simply a CFG with probabilities added to the rules, indicating how likely different rewritings are.

Question 6

Q

speech act interpretation

Answer

A

Speech act interpretation is the process to interpret an utterance that conveys a message that is different from its literal meaning, often for reasons of politeness or subtlety. The problem of speech act interpretation is to determine, given an utterance, which speech act it realizes. For a computer to take part in a conversation, it is essential that it have ability to understand indirect speech acts.

Sample of speech act interpretation:
“Can you switch on the computer?”

can be, depending on the circumstances, interpreted in a direct sense as a question about someone’s physical abilities or in an indirect sense as a request to actually switch on the computer. This depends largely on the context of the utterance.

“Can you open the door?”

might in context be a question, a request, or even an offer. Several kinds of information complicate the recognition process. Literal meaning, lexical and syntactic choices, agents’ beliefs, the immediate situation, and general knowledge about human behaviour all clarify what the ordinary speaker is after.

Question 7

Q

state machines

Answer

A

State machines are a method of modelling systems whose output depends on the entire history of their inputs, and not just on the most recent input. Compared to purely functional systems, in which the output is purely determined by the input, state machines have a performance that is determined by its history.

State machines can be used in:
• conversations, in which, for example, the meaning of a word “it” depends on the history of things that have been said;

Question 8

Q

sequence models

Answer

A

Sequence models are models that:
• can predict the likelihood of a sequence of text (eg: a sentence).
• sometimes using latent state variables.

2 types of generative sequence models:

N-gram models
Hidden Markov Models

sequence models used in:

speech recognition
machine translation
handwriting recognition
spelling correction
OCR

Sample of sequence models with N-gram models:
“Please turn your homework …”

Hopefully, most of you concluded that a very likely word is in, or possibly over, but probably not the. We formalize this idea of word prediction with probabilistic models called N-gram models, which predict the next word from the previous N - 1 words. An N-gram is an N-token sequence of words: a 2-gram (more commonly called a bigram) is a two-word sequence of words like “please turn”, “turn your”, or “your homework”, and a 3-gram (more commonly called a trigram) is a three-word sequence of words like “please turn your”, or “turn your homework”.

Question 9

Q

vector-space model

Answer

A

In the Vector space model of information retrieval, documents and queries are represented as vectors of features representing the terms (words) that occur within the collection.
The value of each feature is called the term weight and is usually a function of the term’s frequency in the document, along with other factors.