Latent Dirichlet Allocation Flashcards

1
Q

LDA in 1 Sentence

A

A generative/probabilistic topic modeling algo that assumes that each document represents a distribution of topics, and each topic represents a distribution of words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

LDA Steps

A

1) Assign each document a topic randomly
2) Based on word distributions across topics (currently random/uniform), calculate p(topic|document) and p(word|topic)
3) Reassign W to new topic with p(topic|document) * p(word|topic) – essentially the probability that the topic generated word W
4) Repeat N number of iterations

In other words, in this step, we’re assuming that all topic assignments except for the current word in question are correct, and then updating the assignment of the current word using our model of how documents are generated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Alpha (LDA)

A

The prior on the per document topic distribution

High alpha indicates each document contains mixture of most of the topics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Beta (LDA)

A

The prior on the per topic word distribution

High beta indicates each topic contains mixture of most of the words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Coherence Score

A

Method for assessing quality of the learned topics. It is the sum of joint probabilities for pairs (or triplets or quadruplets) per word for a given topic.

It can help with figuring out K by looking at topic coherence for a held out data set and see where coherence bottoms out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly