Text Analysis Flashcards

1
Q

What is logistic regression?

A

A machine learning model that classifies text based on a known set of categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sentiment analysis/opinion mining

A

A way of training an ML in text classification using 1 (positive) and 0 (negative) on things like restaurant reviews.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Named entity recognition

A

A text analysis function that identifies people, places, events, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Key phrase extraction

A

A text analysis function that identifies the main ideas of an unstructured text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Summarization

A

A text analysis function that summarizes a text by identifying key information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does it mean when a language detection service returns a language score of NaN and/or value of Unknown?

A

The text is using either ambiguous language like emojis or a language that the model was not trained on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In the context of conversational language understanding, what is the definition of an entity?

A

The specific item that the utterance refers to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In the context of conversational language understanding, how is ‘intent’ defined?

A

Intent is the goal of the utterance to a bot like “turn on” the fan or “turn off” the light

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is ‘authoring’?

A

When you create your own entities and intents instead of using ones that were pre-built for common scenarios

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is tokenization?

A

Breaking down a sentence or utterance into separate words, grammatical markings, and morphemes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you normalize a text?

A

By typing it in lower case and removing punctuation. Upper case indicates that may be a name or title.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an n-gram?

A

A grouping of words that are commonly seen together like a bi-gram would be “he walked”, “she drank” and a trigram would be “I have the” or “I don’t want”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define stop-word removal

A

Removal of words like “the”, “an”, and “a” that don’t have meanings on their own, they just add more meaning to the sentence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why would you use stop-word removal?

A

To better train the ML model in identifying key words and phrases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is ‘stemming’?

A

Using the root to group words together under the same token like “powerful”, “powered”, and “power”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is frequency analysis used for AIs?

A

To quickly identify key words in a document and make a prediction on what it’s about.

17
Q

Explain term frequency-inverse document frequency (TF-IDF)

A

how frequently a term appears in one document but infrequently in others. Like finding the term ‘penikala’ in a Hawaiian text compared to finding it in any other context.