Week 2 Flashcards

Question 1

Q

What is lexical analysis

Answer

A

figure out basic meaning units in language and corresponding meaning.

Question 2

Q

What is Syntactic analysis

Answer

A

how words are related in sentences with others, decode structure of sentences

Question 3

Q

What is Semantic analysis

Answer

A

figure out the meaning of sentences by meaning of words and syntactic structure.

Question 4

Q

What is Pragmatic analysis

Answer

A

find out the meaning in context -> speech acts in language, purpose of communication

Question 5

Q

What is discourse analysis

Answer

A

to analyze a large chunk of text with many sentences, connections and context are considered

Question 6

Q

What is NLP and TIS

Answer

A

Text information system

- TIS can bypass advanced NLP for good performance

Question 7

Q

What is an easy and hard task in TIS

Answer

A

Easy - text classification and retrieval

Hard - machine translation and question answering

Question 8

Q

How is Text represented

Answer

A

string of characters
Word sequence and POS tags
Entity relation recognition
Logic predicates
Speech acts
Deeper NLP -> more human
Intervention -> less robust

Question 9

Q

What are Statistical Language models

Answer

A

represent word sequence by a probability distribution
is context dependant and generative model
different sequence = different probability

Question 10

Q

What is the Unigram LM

Answer

A

frequency of the word in document/number of documents

Question 11

Q

Challenges of Unigram

Answer

A

Unseen words = zero probability

Question 12

Q

Smoothing method in Unigram

Answer

A

Add one to frequency and to document . Or add K

Filter out stop words

Question 13

Q

Pull Vs Push

Answer

A

Pull (Search engine)

User takes initiative
Ad hoc information needs
Query and borwsing

Push ( recommendation system)

System takes initiative
Stable information need/ system knows users need

Question 14

Q

Query vs Browsing

Answer

A

Query

User enters a set of terms
System returns relevant documents
Good with keyword

Browsing

User navigates into relevant info guided by structure/org of docs
Good without keyword

Question 15

Q

Issues of document selection

Answer

A

Classifier in unlikely accurate
- over constrained query
- under constrained query
All relevant docs are not equally relevant

Question 16

Q

Issues of tokenization

Answer

A

one or two tokens

Question 17

Q

What are stop words

Answer

A

list of common words - the, a and be

Question 18

Q

What is lemmatization

Answer

A

reduce inflectional variant forms to base form

cars, car’s -> car
am is are -> be

Question 19

Q

What is stemming

Answer

A

Reduce term to root word before indexing

Brainscape's Knowledge GenomeTM

Week 2 Flashcards

Brainscape's Knowledge Genome^TM