Class 10 Flashcards by Surya Mani

grammer

defines the syntax of legal sentences

How well did you know this?

Not at all

Perfectly

language model

probability distribution describing the likelihood of any string – no pair of people with the exact same language model

How well did you know this?

Not at all

Perfectly

tokenization

process of dividing a text into a sequence of words

How well did you know this?

Not at all

Perfectly

n gram model

markov chain model that considers only the dependence between n adjacent words, works well for spam detection, sentiment analysis, etc…

How well did you know this?

Not at all

Perfectly

character level model

alternative to n-gram model, probability of each character determined by n-1 previous characters

How well did you know this?

Not at all

Perfectly

skip gram model

alternative to n-gram model, count words that are near each other but skip a word (or more) between them

How well did you know this?

Not at all

Perfectly

smoothing

process of reserving some probability for never seen before n grams

How well did you know this?

Not at all

Perfectly

backoff model

estimates n-gram counts, but for low zero counts we back off to (n-1)-grams

How well did you know this?

Not at all

Perfectly

linear interpolation smoothing

backoff model that combines trigram, bigram, and unigram models by linear interpolation

How well did you know this?

Not at all

Perfectly

wordnet

open source, hand curated dictionary in machine readable format which has proven useful for many natural language applications

How well did you know this?

Not at all

Perfectly

penn treebank

corpus of over 3M words of text annotated with part of speech (POS) tags

How well did you know this?

Not at all

Perfectly

beam search

compromise between a fast greedy search and a slower, but more accurate Viterbi algorithm

How well did you know this?

Not at all

Perfectly

hidden markov model

common model for part of speech (POS) tagging – combined with viterbi algorithm can achieve accuracy of around 97%

How well did you know this?

Not at all

Perfectly

discriminative model

learns a conditional probability distribution P(C|W), meaning it can assign categories given a sequence of words but can’t generate random sentences – ex: logistic regression

How well did you know this?

Not at all

Perfectly

language

set of sentences that follow the rules laid out by a grammar

How well did you know this?

Not at all

Perfectly

syntactic categories

Study These Flashcards

help to constrain the probable words at each point within a sentence – ex: noun phrase or verb phrase

phrase structure

Study These Flashcards

provides framework for meaning or semantics of the sentence

overgenerate

Study These Flashcards

when a grammar produces sentences that are not grammatical

undergenerate

Study These Flashcards

when a grammar rejects valid sentences

lexicon

Study These Flashcards

list of allowable words

parsing

Study These Flashcards

process of analyzing a string of words to uncover its phrase structure according to the rules of grammar

cyk algorithm

Study These Flashcards

chart parser that uses chomsky normal form grammar

shift reduce parsing

Study These Flashcards

popular deterministic approach, go through the sentence word by word choosing at each point whether to shift the word onto a stack of constituents or to reduce the top constituent(s) on the stack according to a grammar rule

dependency grammar

Study These Flashcards

assumes that syntactic structure is formed by binary relations between lexical items, without need for syntactic constituents

unsupervised parsing

approach which learns a new grammar or improves an existing grammar using a corpus of sentences without trees

inside outside algorithm

algorithm that learns to estimate the probabilities in a probabilistic context-free grammar (PCFG) from example sentences without trees

semisupervised learning

type of learning that starts with a small number of trees as data to build an initial grammar and then adds a large number of unparsed sentences to improve the grammar

curriculum learning

type of learning that starts with short (2 word) unambiguous sentences and works its way up to 3,4,5 word sentences

semantics

word used to describe what gives meaning to words

lexicalized pcfg

type of augmented grammar that allows us to assign probabilities based on properties of the words in a phrase other than just the syntactic categories

indexicals

phrases that refer directly to the current situation

lexical ambiguity

when a word has more than 1 meaning

syntactic ambiguity

refers to a phrase that has multiple parses

semantic ambiguity

when a word or phrase has multiple meanings

metonymy

figure of speech in which a word or phrase is replaced by another word or phrase that has a close association or relationship with the original

disambiguation

process of resolving ambiguity or uncertainty in language

Class 10 Flashcards

(36 cards)