MID Flashcards

Question 1

Q

What is Natural Language Processing?

Answer

A

The ability of a computer program to understand human speech as it is spoken.

Question 2

Q

Various kinds of knowledge of language?

Answer

A

Phonetics and Phonology
knowledge about linguistic sounds
Morphology, knowledge of the meaningful components of words
Syntax
knowledge of the structural relationships between words
Semantics
knowledge of meaning
Pragmatics
knowledge of the relationship of meaning to the goals and intentions of the speaker
Discourse
knowledge about linguistic units larger than a single utterance

Question 3

Q

What is Ambiguity?

Answer

A

one phrase often has multiple meanings

Question 4

Q

Most important models?

Answer

A

state machines
rule systems
logic
probabilistic models (crucial one)
vector-space models

Question 5

Q

machine learning tools for language tasks?

Answer

A

classifiers

2. sequence models

Question 6

Q

What is Regular Expressions (RE)?

Answer

A

standard notation or language for specifying text sequences
a formula in a special language that is used for specifying simple classes of strings
an algebraic notation for characterizing a set of strings

Question 7

Q

Regex search requires?

Answer

A

pattern and corpus

Question 8

Q

Simplest kind of regex?

Answer

A

a sequence of simple characters

Question 9

Q

most common anchors in regex?

Answer

A

caret symbol, matches the start of a line

2. dollar sign, matches the end of a line

Question 10

Q

RE: \d

Answer

A

any digit

ex: [0-9]

Question 11

Q

RE: \D

Answer

A

any non-digit

ex: [^0-9]

Question 12

Q

RE: \w

Answer

A

any alphanumeric or space

ex: [a-zA-Z0-9 ]

Question 13

Q

RE: \W

Answer

A

a non-alphanumeric

ex: [^\w]

Question 14

Q

RE: \s

Answer

A

whitespace (space, tab)

ex: [ \r\t\n\f]

Question 15

Q

RE: \S

Answer

A

non-whitespace

ex: [^\s]

Question 16

Q

special characters yang perlu pake backslash?

Answer

A

\* (tanda bintang)
\. (tanda titik)
\? (tanda tanya)
\n (newline)
\t (tab)

Question 17

Q

regular language can be describe by?

Answer

A

regular expressions and finite-state automata

Question 18

Q

3 standard solutions to the problem of non-determinism in finite-state automata?

Answer

A

backup, whenever we come to a choice point, we could put a marker to mark where we were in the input and what state the automaton was in. then if it turns out that we took the wrong choice, we could back up and try another path.
look-ahead, we could look ahead in the input to help us decide which path to take.
parallelism, whenever we come to a choice point, we could look at every alternative path in parallel.

Question 19

Q

primitive operations of a regular expression?

Answer

A

concatenation, ujung FSA1 sambung ke awal FSA2
closure, start state sambung ke end state, end state sambung ke start state
union, start state baru sambung ke start state FSA1 dan start state FSA2

Question 20

Q

operations in regular languages?

Answer

A

intersection
difference
complementation
reversal

Question 21

Q

closure known as?

Answer

A

kleene star

Question 22

Q

process steps in NLP?

Answer

A

input
tokenization
syntactic analysis
semantic analysis
pragmatics
output

Question 23

Q

What is Orthographic?

Answer

A

Orthographic rules tell us that English words ending in -y are pluralized by changing the -y to -i- and adding an -es.

Question 24

Q

What is Morphological?

Answer

A

Morphological rules tell us that fish has a null plural, and that the plural of goose is formed by changing the vowel.

Question 25

Q

What is Morphological Parsing?

Answer

A

The problem of recognizing that a word (like foxes) breaks down into component morphemes (fox and -es) and building a structured representation of this fact.

Question 26

Q

What is Parsing?

Answer

A

Parsing means taking an input and producing some sort of linguistic structure for it.

Question 27

Q

What is finite-state transducer?

Answer

A

The key algorithm for morphological parsing

Question 28

Q

What is morpheme?

Answer

A

A morpheme is often defined as the minimal meaning-bearing unit in a language.

Question 29

Q

Affixes divided into 4 types?

Answer

A

Pre-fixes precede the stem, suffixes follow the stem, circumfixes do both, and infixes are inserted inside the stem.

Question 30

Q

ways to combine morphemes to create words?

Answer

A

Inflection
Derivation
Compounding
Cliticization

Question 31

Q

What is Inflection?

Answer

A

Inflection is the combination of a word stem with a grammatical morpheme, usually resulting in a word of the same class as the original stem, and usually filling some syntactic function like agreement.

For example, English has the inflectional morpheme
-s for marking the plural on nouns, and the inflectional morpheme -ed for marking the past tense on verbs.

Question 32

Q

What is Derivation?

Answer

A

Derivation is the combination of a word stem with a grammatical morpheme, usually resulting in a word of a different class, often with a meaning hard to predict exactly.

For example the verb computerize can take the derivational suffix -ation to produce the noun computerization.

Question 33

Q

What is Compounding?

Answer

A

Compounding is the combination of multiple word stems together.
For example the noun doghouse is the concatenation of the morpheme dog with the morpheme house.

Question 34

Q

What is Cliticization?

Answer

A

Cliticization is the combination of a word stem with a clitic. A clitic is a morpheme that acts syntactically like a word, but is reduced in form and attached to another word.
For example the English morpheme ’ve in the word I’ve is a clitic, as is the French definite article l’ in the word l’opera.

Question 35

Q

2 kinds of inflection? an affix that marks … and …

Answer

A

plural
While the regular plural is spelled -s after most nouns, it is spelled -es after words ending in -s (ibis/ibises), -z (waltz/waltzes), -sh (thrush/thrushes), -ch (finch/finches), and sometimes -x (box/boxes). Nouns ending in -y preceded by a consonant change the -y to -i (butterfly/butterflies).
possessive
The possessive suffix is realized by apostrophe + -s for regular singular nouns (llama’s) and plural nouns not ending in -s (children’s) and often by a lone apostrophe after regular plural nouns (llamas’) and some names ending in -s or -z (Euripides’ comedies).

Question 36

Q

what is nominalization?

Answer

A

A very common kind of derivation in English is the formation of new nouns, often from verbs or adjectives.

Question 37

Q

What is clitic?

Answer

A

A clitic is a unit whose status lies in between that of an affix and a word.

Clitics preceding a word are called proclitics, while those following are enclitics.

Question 38

Q

The kinds of morphology?

Answer

A

concatenative morphology
non-concatenative morphology
templatic morphology
root-and-pattern morphology

This is very common in Arabic, Hebrew, and other Semitic languages.

Arabic and Hebrew combine this templatic morphology with concatenative morphology.

Question 39

Q

In orer to build a morphological parser, we’ll need?

Answer

A

Lexicon: the list of stems and affixes, together with basic information about them (whether a stem is a Noun stem or a Verb stem, etc.).
Morphotactics: the model of morpheme ordering that explains which classes of morphemes can follow other classes of morphemes inside a word. For example, the fact that the English plural morpheme follows the noun rather than preceding it is a morphotactic fact.
Orthographic rules: these spelling rules are used to model the changes that occur in a word, usually when two morphemes combine (e.g., the y → ie spelling rule discussed above that changes city + -s to cities rather than citys).

Question 40

Q

What is lexicon?

Answer

A

A lexicon is a repository for words.

Question 41

Q

way to model morphotactics?

Answer

A

Finite-state automaton

Question 42

Q

morphological recognition can be solve by?

Answer

A

Finite-state automaton

Question 43

Q

What is transducer?

Answer

A

A transducer maps between one representation and another

Question 44

Q

What is Finite-state transducer (FST)

Answer

A

A finite-state transducer or FST is a type of finite automaton which maps between two sets of symbols.

FST as recognizer: a transducer that takes a pair of strings as input and outputs accept if the string-pair is in the string-pair language, and reject if it is not.

FST as generator: a machine that outputs pairs of strings of the language. Thus the output is a yes or no, and a pair of output strings.

FST as translator: a machine that reads a string and outputs another string

FST as set relater: a machine that computes relations between sets.

Question 45

Q

What is stemming?

Answer

A

Stemming is an algorithm to improve the performance of information retrieval, especially with smaller documents (the larger the document, the higher the chance the keyword will occur in the exact form used in the query).

Question 46

Q

What is tokenization?

Answer

A

Tokenization is the process of segmenting running text into words and sentences.

Question 47

Q

What is Language model?

Answer

A

Language Model is a statistical models of word sequences

Question 48

Q

What is N-gram?

Answer

A

N-gram is an idea of word prediction with probabilistic models

Question 49

Q

What is Corpus (plural corpora)?

Answer

A

corpus is on-line collection of text or speech

Question 50

Q

What is utterance?

Answer

A

utterance is the spoken correlate of a sentence

Question 51

Q

What is fillers or filled pauses?

Answer

A

words like uh and um

Question 52

Q

What is wordform?

Answer

A

wordform is the full inflected or derived form of the word

Question 53

Q

What is closed vocabulary?

Answer

A

closed vocabulary assumption is the assumption that we have such a lexicon, and that the test can only contain words from this lexicon.

Question 54

Q

What is open vocabulary?

Answer

A

open vocabulary system is one where we model these potential unknown words in the test set by adding a pseudo-word called

Question 55

Q

What is perplexity?

Answer

A

Perplexity is the most common intrinsic evaluation metric for N-gram language models.
Another way to think about perplexity: as the weighted average branching factor of a language.

Question 56

Q

What is An intrinsic evaluation ?

Answer

A

An intrinsic evaluation metric is one which measures the quality of a model independent of any application.

Question 57

Q

What is smoothing?

Answer

A

we use the term smoothing for such modifications that address the poor estimates that are due to variability in small data sets.

Question 58

Q

2 kinds of Smoothing?

Answer

A

Laplace smoothing

2. Good-Turing Discounting

Brainscape's Knowledge GenomeTM

MID Flashcards

Brainscape's Knowledge Genome^TM