NLP Flashcards
what is NLP + examples
designs algorithms to allow computers to “understand” natural language to perform tasks that are useful
- language translation, making appointments, spell checks, sentiment analysis, chatbots
steps : NLP Tasks (hint: there is 8)
- Sentence segmentation
- Word tokenization
- Predicting parts of speech for each token
- Text Lemmantization
- Eliminating stop words
- Dependency parsing, finding noun phrases
- Named Entity Recognition
- Coreference Resolution
what is step 1 of NLP tasks
Sentence segmentation
- break paragraph into individual sentences
- easier to understand each sentence separately
what is step 2 of NLP tasks
Word tokenization
- split sentence into individual words
what is step 3 of NLP tasks
Predicting parts of speech for each token
- matching London(noun), is(verb), the(determiner), etc
what is step 4 of NLP tasks
Text lemmatization
- identify base form of each word
- eg. “pony” is the base form for “ponies”
what is step 5 of NLP tasks
Eliminating stop words
- removing common words (eg. is, the, and)
what is step 6 of NLP tasks
Dependency parsing
- find out how words in the paragraph relate to each other
Finding noun phrases
- group together the words that represent a single idea or thing
what is step 7 of NLP tasks
Named Entity Recognition
- Detect and label nouns wit real would concepts they represent (eg. geographic entity, person, organisation)
what is step 8 of NLP tasks
Coreference Resolution
- associating pronouns with corresponding nouns
- eg. London …. It…. ( “It” == London)
what is WordNet and problems associated with it
a large lexical database of English
- Nouns, adjectives, adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept
problems:
- requires humna labor to create and be up-to-date
- hard to compute accurate word similarity
- problem with rule-based or grammar-based NLP
Word Vector Representation + problems/solution
words represented by one-hot vectors
- vector dimension = number of word in vocabulary
problem: no natural notion of similarity for one hot vectors
solution: represent words based on their surrounding words - have a lower dimensional dense vector rather than a one hot vector; word embedding or word vector
Vector Space Models + examples
represents words by their CONTEXT
- when word appears, context is the set of words that appear nearby (fixed size window)
eg. count based methods, predictive methods, evaluating performance
Applications of word vectors
- find other similar words
- find associations
- add word vectors to get vector for a paragraph
- feed word vectors to deep learners to accomplish complex NLP tasks
Similarity between Vectors (How?)
Dot Product : high value = high similarity
Cosine distance : similar = 0
Cosine similarity : similar = 1