NLP _ 01 Flashcards
What is the most likely the first step of NLP?
cutting board
Text preprocessing
What is the most likely the first step of NLP?
cutting board
Text preprocessing
what is Noise removal?
front of fridge
stripping text of formatting.(e.g. HTML tags.)
What is Tokenization?
under sink door
breaking text into individual words
What is normalization?
Cleaning text data in any other way than Noise removal and tokenization
What is stemming?
it is a blunt axt to shop off word prefexes ans suffixes.
What is lemmatization?
coat closet
It is a scalpel to bring words down to their root forms
What would I import to use regex?
import re
What python package could I use for NLP?
import nltk
what method of nltk would I use to tokenize text?
from nltk.tokenize import word_tokenize
Give an example of a list comprehension :
lemmatized = [lemmatizer.lemmatize(token) for token in tokenized]
How would you import WordNetLemmatizer?
from nltk.stem import WordNetLemmatizer
How would you import PorterStemmer?
from nltk.stem import PorterStemmer?
By default lemmatize() treat every word as a…?
Noun
Language models are probabilistic machine models of …?
language used for NLP comprehension tasks
Language models learn a …?
probability of word occurrence over a sequence of words and use it to estimate the relative likelihood of different phrases.
Common language models include:
Statistical models:
- bag of words (unigram model)
- n-gram models
Neural Language Modeling(NLM)
What is Text simlarity in NLP?
Text similarity is a facet of NLP concerned with the similarity between texts.
What are two popular text similarity metrics?
- Levenshtein distance
- cosine similarity
How would you describe the metric : Levenshtein distance
it is defined as the minimum number of edit operations( deletions,insertions, or substitutions) required to transform a text into another
Define the metric : Cosine similarity
It is defined as teh cosine of the angle between two vectors. To determine the cosine similarity, text documents need to be converted into vectors.
**What are common forms of language prediction?
- **Auto-suggest **and suggested replies
Natural Language processing is concerned with …?
enabling computers to interpret, analyze, and **approximate **the generation of human speech.
What is Parsing w.r.t NLP?
it is the process concerned with segmenting text based on syntax
What is Part-Of-Speech tagging
It identifies parts of speech(verbs, nouns, adjectives, etc..)
It helps computers understand the relationship between the words in a sentence?
A Dependacy grammar tree
What does a Dependency grammar tree help you understand?
The relationship between the words in a sentence.
What does NER stand for?
Named entity recognition
What does NER help identify?
Proper Nouns (e.g., “Natalia” , or “Berlin” ) in a test. This can be a clue to figure out the topic of the text.
When you have ____ coupled with POS tagging you can idenfity specific phrase chuncks
Regex parsing
When you couple Regex parsing and POS tagging you can…?
identify the specific phrase chucks.
A very common unigram model, a statictical language model commonly known as ..?
front door
The Bag-Of-Words
Bag-of-Words can be an excellent way of looking at lanuage when you want to make predicitons concerning….?
the topic or sentiment of a test
When grammer and word order are irrelevant, this is a good mode
what would I import to get word counts for the bag of words model?
from collections
import Counter
how would I import a part-of-speach function for lemmatization?
from part_of_speach import get_part_of_speech
For parsing entire phrases or conducting language prediction , you will want a model that …..?
pays attention to each word’s neighbors.