Guest Lecture 2 Flashcards
1
Q
Name some advantages for using LDA Topic Modeling
A
- Getting an overview of document content.
- You can organizing documents thematically
- Relaxing Keyword search (weighing high word count).
2
Q
List some disadvatages using LDA Topic Modeling
A
- Training topic models is generally slow.
- Exploring topics are not very fast either.
- Preprocessing takes domain knowledge.
- No easy to implement extentions in any standard DF tools.
3
Q
How does LDA Topic Modeling work?
A
- Documents are transformed to a “bag-of-words” before given to the sorting algorithm.
Modelling attempts to learn latent (hidden) variables:
- The topics (word-topic distributions)
- Per-document topic distributions
Requires some variables that cannot be learned (hyperparameters):
“Alpha” : Even distribution of topics per document?.
“Beta” : Topics have relatively even distribution of words per topic?
Estimation of hyperparameters is still an open problem, but other methodes can be used for this purpouse such as Grid Search and the Monte Carlo Method.