NLP _ 01 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is the most likely the first step of NLP?

cutting board

A

Text preprocessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

What is the most likely the first step of NLP?

cutting board

A

Text preprocessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is Noise removal?

front of fridge

A

stripping text of formatting.(e.g. HTML tags.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Tokenization?

under sink door

A

breaking text into individual words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is normalization?

A

Cleaning text data in any other way than Noise removal and tokenization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is stemming?

A

it is a blunt axt to shop off word prefexes ans suffixes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is lemmatization?

coat closet

A

It is a scalpel to bring words down to their root forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What would I import to use regex?

A

import re

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What python package could I use for NLP?

A

import nltk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what method of nltk would I use to tokenize text?

A

from nltk.tokenize import word_tokenize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Give an example of a list comprehension :

A

lemmatized = [lemmatizer.lemmatize(token) for token in tokenized]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How would you import WordNetLemmatizer?

A

from nltk.stem import WordNetLemmatizer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How would you import PorterStemmer?

A

from nltk.stem import PorterStemmer?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

By default lemmatize() treat every word as a…?

A

Noun

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Language models are probabilistic machine models of …?

A

language used for NLP comprehension tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Language models learn a …?

A

probability of word occurrence over a sequence of words and use it to estimate the relative likelihood of different phrases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Common language models include:

A

Statistical models:
- bag of words (unigram model)
- n-gram models
Neural Language Modeling(NLM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Text simlarity in NLP?

A

Text similarity is a facet of NLP concerned with the similarity between texts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are two popular text similarity metrics?

A
  • Levenshtein distance
  • cosine similarity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How would you describe the metric : Levenshtein distance

A

it is defined as the minimum number of edit operations( deletions,insertions, or substitutions) required to transform a text into another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Define the metric : Cosine similarity

A

It is defined as teh cosine of the angle between two vectors. To determine the cosine similarity, text documents need to be converted into vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

**What are common forms of language prediction?

A
  • **Auto-suggest **and suggested replies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Natural Language processing is concerned with …?

A

enabling computers to interpret, analyze, and **approximate **the generation of human speech.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Parsing w.r.t NLP?

A

it is the process concerned with segmenting text based on syntax

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Part-Of-Speech tagging

A

It identifies parts of speech(verbs, nouns, adjectives, etc..)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

It helps computers understand the relationship between the words in a sentence?

A

A Dependacy grammar tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What does a Dependency grammar tree help you understand?

A

The relationship between the words in a sentence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What does NER stand for?

A

Named entity recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does NER help identify?

A

Proper Nouns (e.g., “Natalia” , or “Berlin” ) in a test. This can be a clue to figure out the topic of the text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

When you have ____ coupled with POS tagging you can idenfity specific phrase chuncks

A

Regex parsing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

When you couple Regex parsing and POS tagging you can…?

A

identify the specific phrase chucks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

A very common unigram model, a statictical language model commonly known as ..?

front door

A

The Bag-Of-Words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Bag-of-Words can be an excellent way of looking at lanuage when you want to make predicitons concerning….?

A

the topic or sentiment of a test

When grammer and word order are irrelevant, this is a good mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

what would I import to get word counts for the bag of words model?

A

from collections import Counter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

how would I import a part-of-speach function for lemmatization?

A

from part_of_speach import get_part_of_speech

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

For parsing entire phrases or conducting language prediction , you will want a model that …..?

A

pays attention to each word’s neighbors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Unlike bag-of-words, the n-gram model considers a ….?

A

….sequence of some number (n) units and calculates the probability of each unit in a body of language given the preceding sequence of length n.

Because of this, n-gram probabilities with larger n values can

be impressive at language prediction.

37
Q

What tactic can help with adjusting probabilities for unkown words but it is not always ideal?

A

Language smoothing

38
Q

What is Language smoothing?

A

a tactic that can help adust probabilities for unknown words, but it isn’t always ideal

39
Q

For a model that more accurately predicts human language patterns, you want n (your sequence length) …?

A

….to be as large at possible.

40
Q

What happens if you make your n-grams to long?

A

The number of examples to train off of shrinks and you won’t have enough to train on.

41
Q

What the common Neural langauge models (NLMs) ?

A
  1. LSTMs
  2. Transformer models
42
Q

What is Topic Modeling?

A

It is an area of NLP dedicated to uncovering latent, or hidden , topics within a body of language.

43
Q

A common * technique* is to deprioritize the most common words and prioritize less frequently used terms as topics in a process known as …?

A

term frequency-inverse document frequency (tf-idf)

44
Q

What libraries in Python have modules to handle tf-idf?

A

gensim and sklearn

45
Q

What is LDA or Latent Dirichlet allocation?

A

LDA is a statistical model that takes your documents and determines which words keep popping up together in the same contexts(i.e. documents)

46
Q

What is word embedding?

A

The process of word-to-vector mapping

47
Q

word-to-vector mapping is also called?

A

word embedding

48
Q

If I would like to visualize the topics model results . You could use…?

A

word2vec:
* it is a great technique that can map out your topic model results spatially as vectors so that similary usded words are closer together.

49
Q

How is the Levenshtein ditance calculated?

A

the distance calculated through the minimum number of insertions, deletions, and substitutions that would need to occur for one word to become another.

50
Q

Define: Levenshtein distance

A

the minimal edit ditance between two words.

51
Q

What is Phonetic silimarity?

A

how much words or phrases sound the same.

52
Q

Define: Lexical Similarity

window over kitchen sink

A

the degree to which texts use the same vocabulary and phrases

53
Q

Define: Semantic similarity

A

the degree to which documents contain similar meaning or topics

54
Q

Addressing ________ _________ - including spelling correction - is a major challenge within natural language processing

A

Text similarity

55
Q

What is it called when documents/text contain similar meaning or topics?

A

Semantic similarity

56
Q

What is called when documents/texts share the same degree to which texts use the same vocabulary and phrases

Window over kitchen sink

A

Lexical similarity

57
Q

How would I import a tool to measure the Levenshtein distance?

A

from nltk.metrics import edit_distance

58
Q

what python module has a built-in function to check the levenshtein distance?

A

nltk

59
Q

What is the application of NLP concerned with predicting test given preceding text?

A

Language prediction

60
Q

What is the first step to language prediction?

A

It is picking your langauge model

61
Q

Bag of words alone is generally …?

A

not a great model for langauge prediction.

62
Q

w.r.t Langauge prediction if you go with the n-gram route, you will most likely pick what model?

Magnetic knife holder

A

Markov chains

63
Q

Define the Lanuage Model: Markov chains

Magnetic knife holder

A

the model the predicts the statistical likelihood of each following word(or character) based on the training corpus.
Markov chains are memory-less and make statistica predictions based entierly on the current n-gram on hand.

64
Q

What is a supervised machine learning algorithm that leverages a probabilistic theorem to make predictions and classifications?

A

Naive Bays Classifiers

65
Q

Define :

sentiment analysis

A

determing whether a given block of lanuage expresses negative or postive feelings.

66
Q

Text preproccessing is a stage of ….?

A

NLP focused on cleaning and preparing text for other NLP tasks

67
Q

Parsing is an ….?

A

NLP technique concerned with breaking up text based on syntax

68
Q

What are two python libraries that can handle syntax parsing?

A

gensim & sklearn

69
Q

What is are common text preprocessing steps

A
70
Q

Tokenization will… ?

A

break multi-word strings into smaller components

71
Q

Normalization is a ….?

A

catch-all term for processing data. this includes stemming and lemmatization

72
Q

Noise removal is when we…?

A

remove unnecessary charaters and formating

73
Q

Stemming is….?

A

text preprocessing nomalization task concerned with bluntly removing word affixes(prefixes and suffixes)

74
Q

Lemmatization is a ….?

Coat closet

A

text preprocessing nomalization task concerned with bring words down to thier root forms.

https://www.codecademy.com/learn/paths/data-science-nlp/tracks/dsnlp-text-preprocessing/modules/nlp-text-preprocessing/cheatsheet

75
Q

Stopword Removal is the process of ….?

A

removing words from a string that don’t provide any information about the tone of a statement.

https://www.codecademy.com/learn/paths/data-science-nlp/tracks/dsnlp-text-preprocessing/modules/nlp-text-preprocessing/cheatsheet

76
Q

Using part-of-speech can …?

A

improve the results of lemmatization

77
Q

What are two common Python libraries used in text preprocessing?

A

NLTK and re

78
Q

\_\_\_\_\_\_\_\_\_ is a technique that devolopers use in a variety of domains

A

Text cleaning

79
Q

When you are text cleaning you may want to remove unwanted info such as:
1. ` ______??________`
2. Special Characters
3. Numeric digits
4. Leading, ending, and veritcal whitespace
5. HTML formatting

A

Punctuation and accents

80
Q

When you are text cleaning you may want to remove unwanted info such as:
1. Punctuation and accents
2. ` ______??________`
3. Numeric digits
4. Leading, ending, and veritcal whitespace
5. HTML formatting

A

Special Characters

81
Q

When you are text cleaning you may want to remove unwanted info such as:
1. Punctuation and accents
2. Special Characters
3. —??———
4. Leading, ending, and veritcal whitespace
5. HTML formatting

A

Numeric digits

82
Q

When you are text cleaning you may want to remove unwanted info such as:
1. Punctuation and accents
2. Special Characters
3. Numerica Digits
4.—-??——
5. HTML formatting

A

Leading, ending, and veritcal whitespace

83
Q

When you are text cleaning you may want to remove unwanted info such as:
1. Punctuation and accents
2. Special Characters
3. Numerica Digits
4. Leading , ending , and vertical whitespace
5.—???—

A

HTML formatting

84
Q

The type of noise you need to remove from text usually depends on the ….?

A

source

marketing journal vs a medical journal

85
Q

You can use the \_\_\_\_\_ method in Python’s regular expression library for most of your noise removal needs.

A

.sub()

86
Q

The .sub() method has three required arguments:

  1. —?—
  2. replacement_text – text that replaces all matches in the input string
  3. input – the input string that will be edited by the .sub() method

Top of Fridge

A

pattern – a regular expression that is searched for in the input string. There must be an r preceding the string to indicate it is a raw string, which treats backslashes as literal characters.

87
Q

The .sub() method has three required arguments:

  1. pattern – a regular expression that is searched for in the input string. There must be an r preceding the string to indicate it is a raw string, which treats backslashes as literal characters.
  2. —?—
  3. input – the input string that will be edited by the .sub() method

Top of fridge - ingredients

A

replacement_text – text that replaces all matches in the input string

88
Q

The .sub() method has three required arguments:

  1. pattern – a regular expression that is searched for in the input string. There must be an r preceding the string to indicate it is a raw string, which treats backslashes as literal characters.
  2. replacement_text – text that replaces all matches in the input string
  3. —?—

top of fridge , ingrediants

A

input – the input string that will be edited by the .sub()` method

89
Q

The method .sub() returns a ….?

A

a string with all instances of the pattern replaced by the replacement_text.

90
Q

How could you remove the HTML tag <p> from a string?

A

``import re `

`text = “<p> This is a paragraph</p>”

result = re.sub(r’<.?p>’, ‘’, text)

print(result) `
This is a paragraph

91
Q

What is a common practice to replace HTML tags with..?

A

empty string ''