class 9 Flashcards

1
Q

what are 3 reasons for computers to do NLP?

A
  1. to communicate with humans
  2. to learn
  3. to have a better scientific understanding of language and language use
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a language model?

A

a probability distribution describing the likelihood of any string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is grammars purpose?

A

to define the syntax of legal sentences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the purpose of semantic rules?

A

to define the meaning of the legal sentences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the bag-of-words model?

A

the application of Naive Bayes to a string of words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is tokenization?

A

the process of dividing a text into a sequence of words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is an n-gram model?

A

use a Markov chain model that considers the dependence between n adjacent words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

in what cases would you use n-gram models?

A

in spam detection, author attribution, and sentiment analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are other alternatives to n-gram models?

A

character-level models or skip-gram models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a structured model that is usually constructed through manual labor?

A

a dictionary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what’s a common model for POS tagging?

A

the hidden markov model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

HMM [hidden markov model] combined with what algorithm can produce an accuracy of ~97%

A

Viterbi algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the task of assiging a part of speech to each word in a sentence?

A

part of speech tagging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the corpus of over 3M words of text annotated with POS tags?

A

the Penn Treebank

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what are some types of POS tagging?

A

logistic regression: but uses a greedy search

Viterbi algorithm: slow

beam search: in between logistic and viterbi, keeps accuracy but drops less-likely tags

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

examples of generative pos tagging models?

A

Naive Bayes and HMM

17
Q

what is the name for a list of allowable words?

A

lexicon

18
Q

what are open classes?

A

nouns, names, verbs, adjectives, and adverbs.

change rapidy

19
Q

what are closed classes?

A

pronouns, article propositions, etc

change relatively slowly

20
Q

how can dynamic programming be used for parsing?

A

this method stores the result of every analyzed substring so it doesn’t need to get reanalyzed later.

analyzed substrings are stored in a chart and the algorithm that stores the substrings is called a chart parser

21
Q

what is a chart parser algorithm that uses Chomsky Normal Form grammar?

A

CYK algorithm

22
Q

is Natural language a context-free grammar?

A

it is not, it is very dependent on contextual evidence

23
Q

what are some tasks of NLP?

A

text-to-speech
machine translation
speech recognition
question answering

24
Q

what are some complications of real natural language?

A

when a word has more than one meaning: lexical ambiguity

when a phrase has multiple parses: syntactic ambiguity

25
Q

what’s the word for when a word or phrase is replaced by another word or phrase that has a close association or relationship with the original?

A

metonymy

26
Q

what’s the inside out algorithm?

A

learns to estimate probabilities of PCFG without trees

27
Q

what is curriculum learning?

A

starts with short unambiguous sentences and works its way up to 3,4,5 word sentences

28
Q

what is semisupervised parsing?

A

starts with a small number of trees as data to build an initial grammar and then adds a large number of unparsed sentences to improve the grammar

29
Q

what is unsupervised parsing?

A

uses a corpus of sentences without trees to learn a new grammar or improve an existing grammar

30
Q

what is decoding?

A

the process of generating target words from source words

31
Q

what are sequence to sequence models? seq2seq

A

its a neural network architecture that uses RNNs in conjunction with an LSTM []

32
Q

what’s the goal of machine translation?

A

translating a source language to a target language

33
Q

what are some examples of pretrained word embedded vector dictionaries?

A

WORD2VEC, GloVe, FASTTEXT

34
Q

what is Q learning?

A

it’s a reinforcement learning algorithm that does not use a model for training. Instead it uses iterations to learn and predict things about its environmnet. It also uses an off policy approach which means that it can redefine the original policy and make its own rules

35
Q
A