NLP Basics Flashcards

1
Q

What is NLP?

A

Natural Language Processing

  • Natural Language (as opposed to Artificial Language)
  • Process with Computers
  • Combination of Linguistics (study of languages) and Computer Science, information engineering and ai.
  • Text or spoken voice
  • Branch of AI that helps computers understand, interpret, and manipulate human language (SAS). Understanding and responding back.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do we mean by Natural Language?

A

Natural vs. Artificial Language

  • Natural Language evolved gradually over time, largely unconscious and is used for daily communication between people. Complex syntax, ambiguous semantics. Contains humor, irony, metaphor, connotation, neologisms.
  • Designed, crafted, invented consciously (e.g., programming language, Elvish from Lord of the Rings, Klingon from Star Trek, Esperanto for German, Italy, etc. to have an easy to learn international language, Interlingua). Usually rule-based. Readily parsed, unambiguous, subject to regular, consistent rules of interpretation.
  • Interesting fact: LISP and Prolog were created for AI, but not used as much anymore
  • Morse code is not an artificial language. It’s just a code for alphabet and numbers.
  • Flag semaphore is not an artificial language, it’s a code.
  • Braille is the same thing.
  • Sign language is not an artificial language.
  • Oxford English different a few hundred thousand words
  • Natural Languages were not designed to be processed by machine
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two sides of NLP?

A

NLP is either NLU or NLG.

  • NLU: Natural Language Understanding. Trying to understand language that came from the ordinary world. This is a useful, structured interpretation of an input, like speech or written text. E.g., producing topics. We don’t care if the machine understands. It’s something useful that we can understand. Search is NLU.
  • NLG: Natural Language Generation. Making natural language. We feed something like not sentences and we want the machine to output sentences. Chatbots are examples of NLG.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Who was the first person to think about can AI exist?

A

2000 years ago. Aristotle with toys and statues. Create a create that it would fool people into thinking it was real. Aristotle said no. Liebnez in late 1800s as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the applications of NLU?

A

Automated Text Annotation

  • Tagging (important words and phrases)
  • Metadata extraction/generation (data about the documen, e.g., author of a document, date)
  • Classification (e.g., news article is sports, celebrities)
  • Document summarization

Corpus analytics (corpus is document collection)

  • Theme extraction
  • Clustering
  • Taxonomy mapping (mapping documents from one taxonomy to another taxonomy)
  • Sentiment analysis (usually across a corpus, e.g, look at billions of tweets and find the ones that have negative emotion or cluster them)

Search applications

  • Query repair (e.g., for a typo, did you mean x?)
  • Query refinement (e.g., semantically ambiguous interpretation)
  • Result postprocessing (ranking, clustering, encapsulation)

Advanced applications

  • Machine translation
  • Knowledge discovery
  • Question handling (e.g., mapping questions to FAQs)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the applications of NLG?

A

Often a way of make NLU more digestable to humans.

Text annotation (fancier version of NLU)
-Document summarization, e.g., make new sentence out of old sentence and make that the title of the article

Corpus analytics

  • Adding labels on top of clusters (e.g., cheesy recipes, spicy recipes)
  • Synopsizing corpus-wide topic and/or sentiment trends

Search applications

  • Advanced capsule generation
  • Advance query refinement

Advanced applications

  • Machine translation
  • Knowledge discovery
  • Question handling (e.g., where’s a good view of Golden Gate Bridge?, getting back a question do you want a vista, hotel, restaurant, etc.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a token?

A

A token is the technical name for a sequence of characters — such as hairy, his, or :) — that we want to treat as a group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a collocation?

A

A collocation is a sequence of words that occur together unusually often. Thus red wine is a collocation,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a collocation?

A

A collocation is a sequence of words that occur together unusually often. Red wine is a collocation, machine learning, social media. When you split them up, they mean something different, flat screen. NLTK has a collocations() function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a bigram?

A

A list of word pairs. NLTK has a bigrams function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a Turing test?

A

Can a dialogue system, responding to a user’s text input, perform so naturally that we cannot distinguish it from a human-generated response?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is RoBERTa?

A

An optimized method for pretraining self-supervised NLP systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is NER?

A

Named Entity Recognition. Locating entities in unstructured text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Document Classification?

A

NLU application. Example is a spam filter. Amazon does large classification massive classification of product categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Search: Query Understanding?

A

Inferring the intent and meaning of a search engine users queries.

Query Segmentation:  partition the queries into semantic units
Query Scoping (NER):  map query segments to entity types
Query Expansion:  broaden the query by adding additional phrases/tokens (usually synonyms and abbreviations, e.g., ML or Machine Learning, developer or engineer)
Query Relaxation:  make the query less restrictive by removing tokens (Black propane grill vs. propane grill)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are Dialog Systems?

A

Conversational agents, conversational ai, chatbots.

17
Q

What is machine translation?

A

NLG applications. Translate source text in one language to another. Google is the best at translation at scale. ML techniques and NLP.

18
Q

What is Document Summarization?

A

Generate accurate summaries of longer text, e.g., model-written headline

19
Q

What is lexical diversity?

A

Calculating the range of different words used in a text. A greater range = more lexical diversity.

20
Q

What is a WSD?

A

Word sense disambiguation - identifying words with different meanings (see levels of nlp for more)

21
Q

What is a polysemic word?

A

A word that has multiple meanings depending on context. “The bank will be closed this Saturday” and “The river overflowed the bank.” A lexicon tries to list word senses in order from most common to least common in occurrences.