Class 10 Flashcards

1
Q

grammer

A

defines the syntax of legal sentences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

language model

A

probability distribution describing the likelihood of any string – no pair of people with the exact same language model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

tokenization

A

process of dividing a text into a sequence of words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

n gram model

A

markov chain model that considers only the dependence between n adjacent words, works well for spam detection, sentiment analysis, etc…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

character level model

A

alternative to n-gram model, probability of each character determined by n-1 previous characters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

skip gram model

A

alternative to n-gram model, count words that are near each other but skip a word (or more) between them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

smoothing

A

process of reserving some probability for never seen before n grams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

backoff model

A

estimates n-gram counts, but for low zero counts we back off to (n-1)-grams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

linear interpolation smoothing

A

backoff model that combines trigram, bigram, and unigram models by linear interpolation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

wordnet

A

open source, hand curated dictionary in machine readable format which has proven useful for many natural language applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

penn treebank

A

corpus of over 3M words of text annotated with part of speech (POS) tags

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

beam search

A

compromise between a fast greedy search and a slower, but more accurate Viterbi algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

hidden markov model

A

common model for part of speech (POS) tagging – combined with viterbi algorithm can achieve accuracy of around 97%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

discriminative model

A

learns a conditional probability distribution P(C|W), meaning it can assign categories given a sequence of words but can’t generate random sentences – ex: logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

language

A

set of sentences that follow the rules laid out by a grammar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

syntactic categories

A

help to constrain the probable words at each point within a sentence – ex: noun phrase or verb phrase

13
Q

phrase structure

A

provides framework for meaning or semantics of the sentence

14
Q

overgenerate

A

when a grammar produces sentences that are not grammatical

15
Q

undergenerate

A

when a grammar rejects valid sentences

16
Q

lexicon

A

list of allowable words

17
Q

parsing

A

process of analyzing a string of words to uncover its phrase structure according to the rules of grammar

18
Q

cyk algorithm

A

chart parser that uses chomsky normal form grammar

19
Q

shift reduce parsing

A

popular deterministic approach, go through the sentence word by word choosing at each point whether to shift the word onto a stack of constituents or to reduce the top constituent(s) on the stack according to a grammar rule

20
Q

dependency grammar

A

assumes that syntactic structure is formed by binary relations between lexical items, without need for syntactic constituents

21
Q

unsupervised parsing

A

approach which learns a new grammar or improves an existing grammar using a corpus of sentences without trees

22
Q

inside outside algorithm

A

algorithm that learns to estimate the probabilities in a probabilistic context-free grammar (PCFG) from example sentences without trees

23
Q

semisupervised learning

A

type of learning that starts with a small number of trees as data to build an initial grammar and then adds a large number of unparsed sentences to improve the grammar

24
Q

curriculum learning

A

type of learning that starts with short (2 word) unambiguous sentences and works its way up to 3,4,5 word sentences

25
Q

semantics

A

word used to describe what gives meaning to words

26
Q

lexicalized pcfg

A

type of augmented grammar that allows us to assign probabilities based on properties of the words in a phrase other than just the syntactic categories

27
Q

indexicals

A

phrases that refer directly to the current situation

28
Q

lexical ambiguity

A

when a word has more than 1 meaning

29
Q

syntactic ambiguity

A

refers to a phrase that has multiple parses

30
Q

semantic ambiguity

A

when a word or phrase has multiple meanings

31
Q

metonymy

A

figure of speech in which a word or phrase is replaced by another word or phrase that has a close association or relationship with the original

32
Q

disambiguation

A

process of resolving ambiguity or uncertainty in language