NLP Flashcards

Question 1

Q

what is NLP + examples

Answer

A

designs algorithms to allow computers to “understand” natural language to perform tasks that are useful

language translation, making appointments, spell checks, sentiment analysis, chatbots

Question 2

Q

steps : NLP Tasks (hint: there is 8)

Answer

A

Sentence segmentation
Word tokenization
Predicting parts of speech for each token
Text Lemmantization
Eliminating stop words
Dependency parsing, finding noun phrases
Named Entity Recognition
Coreference Resolution

Question 3

Q

what is step 1 of NLP tasks

Answer

A

Sentence segmentation
- break paragraph into individual sentences
- easier to understand each sentence separately

Question 4

Q

what is step 2 of NLP tasks

Answer

A

Word tokenization
- split sentence into individual words

Question 5

Q

what is step 3 of NLP tasks

Answer

A

Predicting parts of speech for each token
- matching London(noun), is(verb), the(determiner), etc

Question 6

Q

what is step 4 of NLP tasks

Answer

A

Text lemmatization
- identify base form of each word
- eg. “pony” is the base form for “ponies”

Question 7

Q

what is step 5 of NLP tasks

Answer

A

Eliminating stop words
- removing common words (eg. is, the, and)

Question 8

Q

what is step 6 of NLP tasks

Answer

A

Dependency parsing
- find out how words in the paragraph relate to each other

Finding noun phrases
- group together the words that represent a single idea or thing

Question 9

Q

what is step 7 of NLP tasks

Answer

A

Named Entity Recognition
- Detect and label nouns wit real would concepts they represent (eg. geographic entity, person, organisation)

Question 10

Q

what is step 8 of NLP tasks

Answer

A

Coreference Resolution
- associating pronouns with corresponding nouns
- eg. London …. It…. ( “It” == London)

Question 11

Q

what is WordNet and problems associated with it

Answer

A

a large lexical database of English
- Nouns, adjectives, adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept

problems:
- requires humna labor to create and be up-to-date
- hard to compute accurate word similarity
- problem with rule-based or grammar-based NLP

Question 12

Q

Word Vector Representation + problems/solution

Answer

A

words represented by one-hot vectors
- vector dimension = number of word in vocabulary

problem: no natural notion of similarity for one hot vectors
solution: represent words based on their surrounding words - have a lower dimensional dense vector rather than a one hot vector; word embedding or word vector

Question 13

Q

Vector Space Models + examples

Answer

A

represents words by their CONTEXT
- when word appears, context is the set of words that appear nearby (fixed size window)

eg. count based methods, predictive methods, evaluating performance

Question 14

Q

Applications of word vectors

Answer

A

find other similar words
find associations
add word vectors to get vector for a paragraph
feed word vectors to deep learners to accomplish complex NLP tasks

Question 15

Q

Similarity between Vectors (How?)

Answer

A

Dot Product : high value = high similarity
Cosine distance : similar = 0
Cosine similarity : similar = 1

Question 16

Q

what are Count based Methods

Answer

Study These Flashcards

A

Employ co-occurrence counts
- frequencies of words occurring in same documents or sentences or within some window of each other
- each time 2 words appear within a window of each other in a sentence

Question 17

Q

how do Count based Methods work

Answer

Study These Flashcards

A

Input corpus of sentences

Step 1: Compute co-occurrence matrix, X over the entire vocabulary of corpus
- X contains counts of occurrence together for different words

Step 2 : Compute the Singular Value Decomposition (SVD) of the co-occurrence matrix
- reduce dimensions to desired level (desired size of word vector)
- X = USV^T

Question 18

Q

what are Predictive Methods

Answer

Study These Flashcards

A

Continuous Bag of Words (CBOW)
Skip gram

Question 19

Q

Skip gram

Answer

Study These Flashcards

A

uses WORD to PREDICT target context
shallow neural network to represent the probability model
input: one hot representation of the center word
output: probability of a word being in the content
loss function: cross entropy or softmax loss
1 input, 1 hidden and 1 output layer
output layer employs softmax to get probability
NO activation function for hidden layer
0 bias for hidden, output layer nodes
weights in matrices W and W’

refer to slides for math working

NLP Flashcards

(19 cards)