Lecture 1 Flashcards

Question 1

Q

What are some applications of NLP?

Answer

A

Question answering, information extraction, sentiment analysis, machine translation, Language technology

Question 2

Q

What are hot topics in NLP that are still rather hard to solve?

Answer

A

Question answering, paraphrasing and summarization

Question 3

Q

What are some (unresolved) issues in NLP?

Answer

A

ambiguity within sentences or questions
non-standard english i.e. in tweets
idioms i.e. get cold feet
neologisms
tricky entity names

Question 4

Q

Data mining vs. Text mining

Answer

A

Data mining is a process used to find and extract patterns within a large set of data. This process is often done as a first step of the project to prepare the data for further analysis.
Data mining is all about finding the connection between the different data points.

Text mining is one of the automated techniques used in natural language processing that converts unstructured text to structured data that a computer can process and understand. By converting text to information, we can apply further analysis to the data to extract useful information.

Question 5

Q

What is the difference between a bag-of-words and a string of words?

Answer

A

A bag of words is the collection of unique words used in a text corpus i.e. in a particular string.

A string of text is a sentence of not unique words

Question 6

Q

What are regular expressions?

Answer

A

A formal language for specifying text search strings

It requires a pattern that we want to search for, and a corpus of texts to search through
A regular expression search function will search through the corpus returning all texts that contain the pattern.

Question 7

Q

What is text normalization?

Answer

A

Task of putting words/tokens in a standard format.
Text normalization is the process of transforming text into a single canonical form. Normalizing text before allows for proper processing since input is guaranteed to be consistent before operations are performed on it.

When we normalize a natural language resource, we attempt to reduce the randomness in it, bringing it closer to a predefined “standard”.

Question 8

Q

What are common steps in text normalization?

Answer

A

Segmenting/tokenizing words from running text
Normalizing word formats
Segmenting sentences in running text

Question 9

Q

What is the difference between stemming and lemmatization?

Answer

A

The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. For instance:

am, are, is –> be
car, cars, car’s, cars’ –> car
The result of this mapping of text will be something like:
the boy’s cars are different colors –> the boy car be differ color
However, Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time, and often includes the removal of derivational affixes. Lemmatization: Task of determining that two words have same root, despite their surface differences. usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma .

Question 10

Q

What is case folding?

Answer

A

Applications like ‘speech recognition’ and ‘information retrieval’
• reduce all letters to lower case

Question 11

Q

What is the most common english stemmer algorithm?

Answer

A

Porter's --> follows a couple rules
Step 1a
sses →ss    caresses →caress
ies  →i ponies   →poni
ss→ss      caress   →caress
s    →ø            cats     →cat

Question 12

Q

What is sentence segmentation and what is a common problem with it?

Answer

A

process of dividing written text into meaningful units i.e. sentences
! and ? unambiguous but “.” very ambiguous in a sentence i.e. Dr. Claas

Question 13

Q

What are a few kinds of classifiers?

Answer

A

Linear regression
neural networks
SVMs

Question 14

Q

How can you determine how similar two text entities are?

Answer

A

Minimum edit distance algorithm

Question 15

Q

What is the minimum edit distance and what are its operations?

Answer

A

Is the minimum number of editing operations
•Insertion
•Deletion
•Substitution
Needed to transform one into the other