NLP Flashcards

1
Q

what is NLP + examples

A

designs algorithms to allow computers to “understand” natural language to perform tasks that are useful

  • language translation, making appointments, spell checks, sentiment analysis, chatbots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

steps : NLP Tasks (hint: there is 8)

A
  1. Sentence segmentation
  2. Word tokenization
  3. Predicting parts of speech for each token
  4. Text Lemmantization
  5. Eliminating stop words
  6. Dependency parsing, finding noun phrases
  7. Named Entity Recognition
  8. Coreference Resolution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is step 1 of NLP tasks

A

Sentence segmentation
- break paragraph into individual sentences
- easier to understand each sentence separately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is step 2 of NLP tasks

A

Word tokenization
- split sentence into individual words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is step 3 of NLP tasks

A

Predicting parts of speech for each token
- matching London(noun), is(verb), the(determiner), etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is step 4 of NLP tasks

A

Text lemmatization
- identify base form of each word
- eg. “pony” is the base form for “ponies”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is step 5 of NLP tasks

A

Eliminating stop words
- removing common words (eg. is, the, and)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is step 6 of NLP tasks

A

Dependency parsing
- find out how words in the paragraph relate to each other

Finding noun phrases
- group together the words that represent a single idea or thing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is step 7 of NLP tasks

A

Named Entity Recognition
- Detect and label nouns wit real would concepts they represent (eg. geographic entity, person, organisation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is step 8 of NLP tasks

A

Coreference Resolution
- associating pronouns with corresponding nouns
- eg. London …. It…. ( “It” == London)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is WordNet and problems associated with it

A

a large lexical database of English
- Nouns, adjectives, adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept

problems:
- requires humna labor to create and be up-to-date
- hard to compute accurate word similarity
- problem with rule-based or grammar-based NLP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Word Vector Representation + problems/solution

A

words represented by one-hot vectors
- vector dimension = number of word in vocabulary

problem: no natural notion of similarity for one hot vectors
solution: represent words based on their surrounding words - have a lower dimensional dense vector rather than a one hot vector; word embedding or word vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Vector Space Models + examples

A

represents words by their CONTEXT
- when word appears, context is the set of words that appear nearby (fixed size window)

eg. count based methods, predictive methods, evaluating performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Applications of word vectors

A
  • find other similar words
  • find associations
  • add word vectors to get vector for a paragraph
  • feed word vectors to deep learners to accomplish complex NLP tasks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Similarity between Vectors (How?)

A

Dot Product : high value = high similarity
Cosine distance : similar = 0
Cosine similarity : similar = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are Count based Methods

A

Employ co-occurrence counts
- frequencies of words occurring in same documents or sentences or within some window of each other
- each time 2 words appear within a window of each other in a sentence

17
Q

how do Count based Methods work

A

Input corpus of sentences

Step 1: Compute co-occurrence matrix, X over the entire vocabulary of corpus
- X contains counts of occurrence together for different words

Step 2 : Compute the Singular Value Decomposition (SVD) of the co-occurrence matrix
- reduce dimensions to desired level (desired size of word vector)
- X = USV^T

18
Q

what are Predictive Methods

A
  1. Continuous Bag of Words (CBOW)
  2. Skip gram
19
Q

Skip gram

A
  • uses WORD to PREDICT target context
  • shallow neural network to represent the probability model
  • input: one hot representation of the center word
  • output: probability of a word being in the content
  • loss function: cross entropy or softmax loss
  • 1 input, 1 hidden and 1 output layer
  • output layer employs softmax to get probability
  • NO activation function for hidden layer
  • 0 bias for hidden, output layer nodes
  • weights in matrices W and W’

refer to slides for math working