MID Flashcards

1
Q

What is Natural Language Processing?

A

The ability of a computer program to understand human speech as it is spoken.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Various kinds of knowledge of language?

A
  1. Phonetics and Phonology
    knowledge about linguistic sounds
  2. Morphology, knowledge of the meaningful components of words
  3. Syntax
    knowledge of the structural relationships between words
  4. Semantics
    knowledge of meaning
  5. Pragmatics
    knowledge of the relationship of meaning to the goals and intentions of the speaker
  6. Discourse
    knowledge about linguistic units larger than a single utterance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Ambiguity?

A

one phrase often has multiple meanings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Most important models?

A
  1. state machines
  2. rule systems
  3. logic
  4. probabilistic models (crucial one)
  5. vector-space models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

machine learning tools for language tasks?

A
  1. classifiers

2. sequence models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Regular Expressions (RE)?

A
  1. standard notation or language for specifying text sequences
  2. a formula in a special language that is used for specifying simple classes of strings
  3. an algebraic notation for characterizing a set of strings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Regex search requires?

A

pattern and corpus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Simplest kind of regex?

A

a sequence of simple characters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

most common anchors in regex?

A
  1. caret symbol, matches the start of a line

2. dollar sign, matches the end of a line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

RE: \d

A

any digit

ex: [0-9]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

RE: \D

A

any non-digit

ex: [^0-9]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

RE: \w

A

any alphanumeric or space

ex: [a-zA-Z0-9 ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

RE: \W

A

a non-alphanumeric

ex: [^\w]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

RE: \s

A

whitespace (space, tab)

ex: [ \r\t\n\f]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

RE: \S

A

non-whitespace

ex: [^\s]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

special characters yang perlu pake backslash?

A
\* (tanda bintang)
\. (tanda titik)
\? (tanda tanya)
\n (newline)
\t (tab)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

regular language can be describe by?

A

regular expressions and finite-state automata

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

3 standard solutions to the problem of non-determinism in finite-state automata?

A
  1. backup, whenever we come to a choice point, we could put a marker to mark where we were in the input and what state the automaton was in. then if it turns out that we took the wrong choice, we could back up and try another path.
  2. look-ahead, we could look ahead in the input to help us decide which path to take.
  3. parallelism, whenever we come to a choice point, we could look at every alternative path in parallel.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

primitive operations of a regular expression?

A
  1. concatenation, ujung FSA1 sambung ke awal FSA2
  2. closure, start state sambung ke end state, end state sambung ke start state
  3. union, start state baru sambung ke start state FSA1 dan start state FSA2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

operations in regular languages?

A
  1. intersection
  2. difference
  3. complementation
  4. reversal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

closure known as?

A

kleene star

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

process steps in NLP?

A
  1. input
  2. tokenization
  3. syntactic analysis
  4. semantic analysis
  5. pragmatics
  6. output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Orthographic?

A

Orthographic rules tell us that English words ending in -y are pluralized by changing the -y to -i- and adding an -es.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Morphological?

A

Morphological rules tell us that fish has a null plural, and that the plural of goose is formed by changing the vowel.

25
Q

What is Morphological Parsing?

A

The problem of recognizing that a word (like foxes) breaks down into component morphemes (fox and -es) and building a structured representation of this fact.

26
Q

What is Parsing?

A

Parsing means taking an input and producing some sort of linguistic structure for it.

27
Q

What is finite-state transducer?

A

The key algorithm for morphological parsing

28
Q

What is morpheme?

A

A morpheme is often defined as the minimal meaning-bearing unit in a language.

29
Q

Affixes divided into 4 types?

A

Pre-fixes precede the stem, suffixes follow the stem, circumfixes do both, and infixes are inserted inside the stem.

30
Q

ways to combine morphemes to create words?

A
  1. Inflection
  2. Derivation
  3. Compounding
  4. Cliticization
31
Q

What is Inflection?

A

Inflection is the combination of a word stem with a grammatical morpheme, usually resulting in a word of the same class as the original stem, and usually filling some syntactic function like agreement.

For example, English has the inflectional morpheme
-s for marking the plural on nouns, and the inflectional morpheme -ed for marking the past tense on verbs.

32
Q

What is Derivation?

A

Derivation is the combination of a word stem with a grammatical morpheme, usually resulting in a word of a different class, often with a meaning hard to predict exactly.

For example the verb computerize can take the derivational suffix -ation to produce the noun computerization.

33
Q

What is Compounding?

A

Compounding is the combination of multiple word stems together.
For example the noun doghouse is the concatenation of the morpheme dog with the morpheme house.

34
Q

What is Cliticization?

A

Cliticization is the combination of a word stem with a clitic. A clitic is a morpheme that acts syntactically like a word, but is reduced in form and attached to another word.
For example the English morpheme ’ve in the word I’ve is a clitic, as is the French definite article l’ in the word l’opera.

35
Q

2 kinds of inflection? an affix that marks … and …

A
  1. plural
    While the regular plural is spelled -s after most nouns, it is spelled -es after words ending in -s (ibis/ibises), -z (waltz/waltzes), -sh (thrush/thrushes), -ch (finch/finches), and sometimes -x (box/boxes). Nouns ending in -y preceded by a consonant change the -y to -i (butterfly/butterflies).
  2. possessive
    The possessive suffix is realized by apostrophe + -s for regular singular nouns (llama’s) and plural nouns not ending in -s (children’s) and often by a lone apostrophe after regular plural nouns (llamas’) and some names ending in -s or -z (Euripides’ comedies).
36
Q

what is nominalization?

A

A very common kind of derivation in English is the formation of new nouns, often from verbs or adjectives.

37
Q

What is clitic?

A

A clitic is a unit whose status lies in between that of an affix and a word.

Clitics preceding a word are called proclitics, while those following are enclitics.

38
Q

The kinds of morphology?

A
  1. concatenative morphology
  2. non-concatenative morphology
  3. templatic morphology
  4. root-and-pattern morphology

This is very common in Arabic, Hebrew, and other Semitic languages.

Arabic and Hebrew combine this templatic morphology with concatenative morphology.

39
Q

In orer to build a morphological parser, we’ll need?

A
  1. Lexicon: the list of stems and affixes, together with basic information about them (whether a stem is a Noun stem or a Verb stem, etc.).
  2. Morphotactics: the model of morpheme ordering that explains which classes of morphemes can follow other classes of morphemes inside a word. For example, the fact that the English plural morpheme follows the noun rather than preceding it is a morphotactic fact.
  3. Orthographic rules: these spelling rules are used to model the changes that occur in a word, usually when two morphemes combine (e.g., the y → ie spelling rule discussed above that changes city + -s to cities rather than citys).
40
Q

What is lexicon?

A

A lexicon is a repository for words.

41
Q

way to model morphotactics?

A

Finite-state automaton

42
Q

morphological recognition can be solve by?

A

Finite-state automaton

43
Q

What is transducer?

A

A transducer maps between one representation and another

44
Q

What is Finite-state transducer (FST)

A

A finite-state transducer or FST is a type of finite automaton which maps between two sets of symbols.

FST as recognizer: a transducer that takes a pair of strings as input and outputs accept if the string-pair is in the string-pair language, and reject if it is not.

FST as generator: a machine that outputs pairs of strings of the language. Thus the output is a yes or no, and a pair of output strings.

FST as translator: a machine that reads a string and outputs another string

FST as set relater: a machine that computes relations between sets.

45
Q

What is stemming?

A

Stemming is an algorithm to improve the performance of information retrieval, especially with smaller documents (the larger the document, the higher the chance the keyword will occur in the exact form used in the query).

46
Q

What is tokenization?

A

Tokenization is the process of segmenting running text into words and sentences.

47
Q

What is Language model?

A

Language Model is a statistical models of word sequences

48
Q

What is N-gram?

A

N-gram is an idea of word prediction with probabilistic models

49
Q

What is Corpus (plural corpora)?

A

corpus is on-line collection of text or speech

50
Q

What is utterance?

A

utterance is the spoken correlate of a sentence

51
Q

What is fillers or filled pauses?

A

words like uh and um

52
Q

What is wordform?

A

wordform is the full inflected or derived form of the word

53
Q

What is closed vocabulary?

A

closed vocabulary assumption is the assumption that we have such a lexicon, and that the test can only contain words from this lexicon.

54
Q

What is open vocabulary?

A

open vocabulary system is one where we model these potential unknown words in the test set by adding a pseudo-word called

55
Q

What is perplexity?

A

Perplexity is the most common intrinsic evaluation metric for N-gram language models.
Another way to think about perplexity: as the weighted average branching factor of a language.

56
Q

What is An intrinsic evaluation ?

A

An intrinsic evaluation metric is one which measures the quality of a model independent of any application.

57
Q

What is smoothing?

A

we use the term smoothing for such modifications that address the poor estimates that are due to variability in small data sets.

58
Q

2 kinds of Smoothing?

A
  1. Laplace smoothing

2. Good-Turing Discounting