Week 2 Flashcards

1
Q

What is lexical analysis

A

figure out basic meaning units in language and corresponding meaning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Syntactic analysis

A

how words are related in sentences with others, decode structure of sentences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Semantic analysis

A

figure out the meaning of sentences by meaning of words and syntactic structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Pragmatic analysis

A

find out the meaning in context -> speech acts in language, purpose of communication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is discourse analysis

A

to analyze a large chunk of text with many sentences, connections and context are considered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is NLP and TIS

A

Text information system

- TIS can bypass advanced NLP for good performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an easy and hard task in TIS

A

Easy - text classification and retrieval

Hard - machine translation and question answering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is Text represented

A
string of characters
Word sequence and POS tags
Entity relation recognition
Logic predicates
Speech acts
Deeper NLP -> more human
Intervention -> less robust
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are Statistical Language models

A
  • represent word sequence by a probability distribution
  • is context dependant and generative model
  • different sequence = different probability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Unigram LM

A

frequency of the word in document/number of documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Challenges of Unigram

A
  • Unseen words = zero probability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Smoothing method in Unigram

A

Add one to frequency and to document . Or add K

Filter out stop words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pull Vs Push

A

Pull (Search engine)

  • User takes initiative
  • Ad hoc information needs
  • Query and borwsing

Push ( recommendation system)

  • System takes initiative
  • Stable information need/ system knows users need
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Query vs Browsing

A

Query

  • User enters a set of terms
  • System returns relevant documents
  • Good with keyword

Browsing

  • User navigates into relevant info guided by structure/org of docs
  • Good without keyword
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Issues of document selection

A

Classifier in unlikely accurate
- over constrained query
- under constrained query
All relevant docs are not equally relevant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Issues of tokenization

A

one or two tokens

17
Q

What are stop words

A

list of common words - the, a and be

18
Q

What is lemmatization

A

reduce inflectional variant forms to base form

  • cars, car’s -> car
  • am is are -> be
19
Q

What is stemming

A

Reduce term to root word before indexing