Session 6.1 Flashcards

1
Q

Bag of words

A

treats every word as a term in the document.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bag of N-grams

A

treats every possible collection of N adjacent words as a term in the document.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

term frequency (TF)

A

e.g., raw count

TF(t, d) = A raw count of times term t appears in a document d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

TFIDF

A

TFIDF (t, d)= Product of Term Frequency TF(t, d) and Inverse Document Frequency IDF(t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Document Term Matrix (DTM)

A

Each document is a row and each term is a column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Term Document Matrix (TDM)

A

Each term is a row and each document is a column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Disadvantage of bag of words/N-grams & solutions

A

Massive number of features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

To decrease the numbers of words/terms/features in documents:

A
➢ Cleaning and preprocessing text
• Case normalization
• Removing punctuation
• Removing numbers
• Removing stopwords
• Word stemming and stem completion

➢ Feature selection

➢ Special consideration to computational storage space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sentiment analysis technique

A

Detecting the sentiment of the text, e.g.,
• positive/negative/neutral
• urgent/not urgent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Three main levels of sentiment analysis

A
  • Document-level
  • Sentence-level
  • Aspect-level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is topic modelling technique different from clustering?

A

Topic modelling:
A document can be associated with more than one topic

Clustering:
A document only shows up in one of the clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly