Class 10 Flashcards

1
Q

grammer

A

defines the syntax of legal sentences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

language model

A

probability distribution describing the likelihood of any string – no pair of people with the exact same language model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

tokenization

A

process of dividing a text into a sequence of words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

n gram model

A

markov chain model that considers only the dependence between n adjacent words, works well for spam detection, sentiment analysis, etc…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

character level model

A

alternative to n-gram model, probability of each character determined by n-1 previous characters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

skip gram model

A

alternative to n-gram model, count words that are near each other but skip a word (or more) between them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

smoothing

A

process of reserving some probability for never seen before n grams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

backoff model

A

estimates n-gram counts, but for low zero counts we back off to (n-1)-grams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

linear interpolation smoothing

A

backoff model that combines trigram, bigram, and unigram models by linear interpolation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

wordnet

A

open source, hand curated dictionary in machine readable format which has proven useful for many natural language applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

penn treebank

A

corpus of over 3M words of text annotated with part of speech (POS) tags

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

beam search

A

compromise between a fast greedy search and a slower, but more accurate Viterbi algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

hidden markov model

A

common model for part of speech (POS) tagging – combined with viterbi algorithm can achieve accuracy of around 97%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

discriminative model

A

learns a conditional probability distribution P(C|W), meaning it can assign categories given a sequence of words but can’t generate random sentences – ex: logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

language

A

set of sentences that follow the rules laid out by a grammar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

syntactic categories

A

help to constrain the probable words at each point within a sentence – ex: noun phrase or verb phrase

13
Q

phrase structure

A

provides framework for meaning or semantics of the sentence

14
Q

overgenerate

A

when a grammar produces sentences that are not grammatical

15
Q

undergenerate

A

when a grammar rejects valid sentences

16
Q

lexicon

A

list of allowable words

17
Q

parsing

A

process of analyzing a string of words to uncover its phrase structure according to the rules of grammar

18
Q

cyk algorithm

A

chart parser that uses chomsky normal form grammar

19
Q

shift reduce parsing

A

popular deterministic approach, go through the sentence word by word choosing at each point whether to shift the word onto a stack of constituents or to reduce the top constituent(s) on the stack according to a grammar rule

20
Q

dependency grammar

A

assumes that syntactic structure is formed by binary relations between lexical items, without need for syntactic constituents

21
unsupervised parsing
approach which learns a new grammar or improves an existing grammar using a corpus of sentences without trees
22
inside outside algorithm
algorithm that learns to estimate the probabilities in a probabilistic context-free grammar (PCFG) from example sentences without trees
23
semisupervised learning
type of learning that starts with a small number of trees as data to build an initial grammar and then adds a large number of unparsed sentences to improve the grammar
24
curriculum learning
type of learning that starts with short (2 word) unambiguous sentences and works its way up to 3,4,5 word sentences
25
semantics
word used to describe what gives meaning to words
26
lexicalized pcfg
type of augmented grammar that allows us to assign probabilities based on properties of the words in a phrase other than just the syntactic categories
27
indexicals
phrases that refer directly to the current situation
28
lexical ambiguity
when a word has more than 1 meaning
29
syntactic ambiguity
refers to a phrase that has multiple parses
30
semantic ambiguity
when a word or phrase has multiple meanings
31
metonymy
figure of speech in which a word or phrase is replaced by another word or phrase that has a close association or relationship with the original
32
disambiguation
process of resolving ambiguity or uncertainty in language