Latent Dirichlet Allocation Flashcards

Question 1

Q

LDA in 1 Sentence

Answer

A

A generative/probabilistic topic modeling algo that assumes that each document represents a distribution of topics, and each topic represents a distribution of words.

Question 2

Q

LDA Steps

Answer

A

1) Assign each document a topic randomly
2) Based on word distributions across topics (currently random/uniform), calculate p(topic|document) and p(word|topic)
3) Reassign W to new topic with p(topic|document) * p(word|topic) – essentially the probability that the topic generated word W
4) Repeat N number of iterations

In other words, in this step, we’re assuming that all topic assignments except for the current word in question are correct, and then updating the assignment of the current word using our model of how documents are generated.

Question 3

Q

Alpha (LDA)

Answer

A

The prior on the per document topic distribution

High alpha indicates each document contains mixture of most of the topics.

Question 4

Q

Beta (LDA)

Answer

A

The prior on the per topic word distribution

High beta indicates each topic contains mixture of most of the words.

Question 5

Q

Coherence Score

Answer

A

Method for assessing quality of the learned topics. It is the sum of joint probabilities for pairs (or triplets or quadruplets) per word for a given topic.

It can help with figuring out K by looking at topic coherence for a held out data set and see where coherence bottoms out.

Latent Dirichlet Allocation Flashcards

(5 cards)