Text Analysis Flashcards
What is logistic regression?
A machine learning model that classifies text based on a known set of categories
Sentiment analysis/opinion mining
A way of training an ML in text classification using 1 (positive) and 0 (negative) on things like restaurant reviews.
Named entity recognition
A text analysis function that identifies people, places, events, etc.
Key phrase extraction
A text analysis function that identifies the main ideas of an unstructured text
Summarization
A text analysis function that summarizes a text by identifying key information
What does it mean when a language detection service returns a language score of NaN and/or value of Unknown?
The text is using either ambiguous language like emojis or a language that the model was not trained on.
In the context of conversational language understanding, what is the definition of an entity?
The specific item that the utterance refers to
In the context of conversational language understanding, how is ‘intent’ defined?
Intent is the goal of the utterance to a bot like “turn on” the fan or “turn off” the light
What is ‘authoring’?
When you create your own entities and intents instead of using ones that were pre-built for common scenarios
What is tokenization?
Breaking down a sentence or utterance into separate words, grammatical markings, and morphemes
How do you normalize a text?
By typing it in lower case and removing punctuation. Upper case indicates that may be a name or title.
What is an n-gram?
A grouping of words that are commonly seen together like a bi-gram would be “he walked”, “she drank” and a trigram would be “I have the” or “I don’t want”.
Define stop-word removal
Removal of words like “the”, “an”, and “a” that don’t have meanings on their own, they just add more meaning to the sentence.
Why would you use stop-word removal?
To better train the ML model in identifying key words and phrases
What is ‘stemming’?
Using the root to group words together under the same token like “powerful”, “powered”, and “power”