Lecture 1 Flashcards

1
Q

What are some applications of NLP?

A

Question answering, information extraction, sentiment analysis, machine translation, Language technology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are hot topics in NLP that are still rather hard to solve?

A

Question answering, paraphrasing and summarization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some (unresolved) issues in NLP?

A
  • ambiguity within sentences or questions
  • non-standard english i.e. in tweets
  • idioms i.e. get cold feet
  • neologisms
  • tricky entity names
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data mining vs. Text mining

A

Data mining is a process used to find and extract patterns within a large set of data. This process is often done as a first step of the project to prepare the data for further analysis.
Data mining is all about finding the connection between the different data points.

Text mining is one of the automated techniques used in natural language processing that converts unstructured text to structured data that a computer can process and understand. By converting text to information, we can apply further analysis to the data to extract useful information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between a bag-of-words and a string of words?

A

A bag of words is the collection of unique words used in a text corpus i.e. in a particular string.

A string of text is a sentence of not unique words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are regular expressions?

A

A formal language for specifying text search strings

  • It requires a pattern that we want to search for, and a corpus of texts to search through
  • A regular expression search function will search through the corpus returning all texts that contain the pattern.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is text normalization?

A

Task of putting words/tokens in a standard format.
Text normalization is the process of transforming text into a single canonical form. Normalizing text before allows for proper processing since input is guaranteed to be consistent before operations are performed on it.

When we normalize a natural language resource, we attempt to reduce the randomness in it, bringing it closer to a predefined “standard”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are common steps in text normalization?

A
  1. Segmenting/tokenizing words from running text
  2. Normalizing word formats
  3. Segmenting sentences in running text
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between stemming and lemmatization?

A

The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. For instance:

am, are, is –> be
car, cars, car’s, cars’ –> car
The result of this mapping of text will be something like:
the boy’s cars are different colors –> the boy car be differ color
However, Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time, and often includes the removal of derivational affixes. Lemmatization: Task of determining that two words have same root, despite their surface differences. usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is case folding?

A

Applications like ‘speech recognition’ and ‘information retrieval’
• reduce all letters to lower case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the most common english stemmer algorithm?

A
Porter's --> follows a couple rules
Step 1a
sses →ss    caresses →caress
ies  →i ponies   →poni
ss→ss      caress   →caress
s    →ø            cats     →cat
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is sentence segmentation and what is a common problem with it?

A

process of dividing written text into meaningful units i.e. sentences
! and ? unambiguous but “.” very ambiguous in a sentence i.e. Dr. Claas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are a few kinds of classifiers?

A
  • Linear regression
  • neural networks
  • SVMs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can you determine how similar two text entities are?

A

Minimum edit distance algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the minimum edit distance and what are its operations?

A
Is the minimum number of editing operations
•Insertion
•Deletion
•Substitution
Needed to transform one into the other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly