Natural Language Processing Flashcards

1
Q

What is the Goal of Natural Language Processing?

A

To make machines understand and interpret human language the way it is written or spoken.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two levels of Linguistic Analysis?

A

Syntax: What part of the given text is grammatically correct

Semantics: What is the meaning of the given text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Natural Language Understanding?

A

Trying to understand the meaning of the given text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the four ambiguities that need to be resolved for NLU?

A

Lexical, Syntactic, Semantic, Anaphoric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Lexical Ambiguity?

A

Words have multiple meaning, also known as Polysemy or Synonomy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Syntactic Ambiguity?

A

A sentence has multiple parse trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Semantic Ambiguity?

A

Sentence has multiple meanings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Anaphoric Ambiguity?

A

One word or phrase has two different meanings in the sentence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the four steps in the NLU process?

A

Syntax Analysis, Semantics, Named Entity Recognition, intent Recognition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 7 steps in the NLP Pipeline?

A

Sentence Segmentation, Tokenization, Stemming, Part of Speech tagging, parsing, Named Entity Recognition, Co-reference (discourse) resolution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Sentence Segmentation?

A

The process of Identifying the sentence boundaries in the text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Tokenization?

A

The process of Identifiying different words, numbers, and other punctuations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Stemming?

A

The process of stripping the ends of words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Part of Speech (POS) Tagging?

A

The process of assigning each word in a sentence its own part of speech tag such as designating words as nouns or verbs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Parsing?

A

The process of dividing given sentences into different categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Named Identity Recognition?

A

The process of Identifying entities such as a person, location, or time.

17
Q

What is Co-Reference (Discourse) Resolution?

A

The process of defining the relationship of an given word in the sentence with the next and previous sentence.

18
Q

What is the goal of Lemmatization and Stemming?

A

The goal is to reduce the inflectional forms and derivationally related forms of a word to a common base form

19
Q

What is the difference between Lemmatization and Stemming?

A

Stemming is a crude heuristic process that just chops the end of the word off, whereas lemmatization does it properly with the use of a vocabulary and morphological analysis of words.

20
Q

What are stop words?

A

A list of the most common words in a language. This list is not universal and can change depending on application.

21
Q

What is a “Bag-of-Words”?

A

A simple feature extraction techniques that describes the occurrence of each word in a document with no care for location information. The idea is that similar documents have similar contents.

22
Q

What is Term Frequency-Inverse Document Frequency (TF-IDF)?

A

This is a statistical measure used to evaluate the importance of a word to a document or in a collection.

23
Q

What is N-gram word prediction?

A

Using the probabilities of a sequence of words to choose the most likely next word or provide correction of spelling errors.

24
Q

What is the Markov Assumption for Language?

A

Only prior local context, the last few words, affects the next word. This means that the probability of a word only depends on the previous N-1 words.

25
Q

What are the limitations of the N-gram model?

A

The higher the N the better the model overall but this leads to a lot of computational overhead.

N-grams are a sparse representation of a language

It will be a 0 probability to all words that are not in the training corpus.