Text Analysis Flashcards

Question 1

Q

What is logistic regression?

Answer

A

A machine learning model that classifies text based on a known set of categories

Question 2

Q

Sentiment analysis/opinion mining

Answer

A

A way of training an ML in text classification using 1 (positive) and 0 (negative) on things like restaurant reviews.

Question 3

Q

Named entity recognition

Answer

A

A text analysis function that identifies people, places, events, etc.

Question 4

Q

Key phrase extraction

Answer

A

A text analysis function that identifies the main ideas of an unstructured text

Question 5

Q

Summarization

Answer

A

A text analysis function that summarizes a text by identifying key information

Question 6

Q

What does it mean when a language detection service returns a language score of NaN and/or value of Unknown?

Answer

A

The text is using either ambiguous language like emojis or a language that the model was not trained on.

Question 7

Q

In the context of conversational language understanding, what is the definition of an entity?

Answer

A

The specific item that the utterance refers to

Question 8

Q

In the context of conversational language understanding, how is ‘intent’ defined?

Answer

A

Intent is the goal of the utterance to a bot like “turn on” the fan or “turn off” the light

Question 9

Q

What is ‘authoring’?

Answer

A

When you create your own entities and intents instead of using ones that were pre-built for common scenarios

Question 10

Q

What is tokenization?

Answer

A

Breaking down a sentence or utterance into separate words, grammatical markings, and morphemes

Question 11

Q

How do you normalize a text?

Answer

A

By typing it in lower case and removing punctuation. Upper case indicates that may be a name or title.

Question 12

Q

What is an n-gram?

Answer

A

A grouping of words that are commonly seen together like a bi-gram would be “he walked”, “she drank” and a trigram would be “I have the” or “I don’t want”.

Question 13

Q

Define stop-word removal

Answer

A

Removal of words like “the”, “an”, and “a” that don’t have meanings on their own, they just add more meaning to the sentence.

Question 14

Q

Why would you use stop-word removal?

Answer

A

To better train the ML model in identifying key words and phrases

Question 15

Q

What is ‘stemming’?

Answer

A

Using the root to group words together under the same token like “powerful”, “powered”, and “power”

Question 16

Q

How is frequency analysis used for AIs?

Answer

A

To quickly identify key words in a document and make a prediction on what it’s about.

Question 17

Q

Explain term frequency-inverse document frequency (TF-IDF)

Answer

A

how frequently a term appears in one document but infrequently in others. Like finding the term ‘penikala’ in a Hawaiian text compared to finding it in any other context.

Brainscape's Knowledge GenomeTM

Text Analysis Flashcards

Brainscape's Knowledge Genome^TM