LDA2VEC Flashcards

Question 1

Q

What is Latent Dirichlet Allocation (LDA)?

Answer

A

LDA is a probabilistic topic model and it treats documents as a bag-of-words.

Question 2

Q

What is Latent Dirichlet Allocation (LDA) to vector (lda2vec)?

Answer

A

lda2vec builds document representations on top of word embeddings.

Question 3

Q

What is a topic modeling?

Answer

A

Topic models often assume that word usage is correlated with topic occurence. It divides the documents into clusters according to word usage.

Question 4

Q

What is Bag-of-words?

Answer

A

Document as a vector with dim = vocabulary size. Each dimension of this vector corresponds to the count or occurrence of a word in a document.

Question 5

Q

LDA is a general Machine Learning (ML) technique, How?

Answer

A

Used in unsupervised ML problems where the input is a collection of fixed-length vectors and the goal is to explore the structure of this data.

Question 6

Q

What is the biggest disadvantage of LDA?

Answer

A

The LDA model learns a document vector that predicts words inside of that document while disregarding any structure or how these words interact on a local level.

Question 7

Q

When LDA produce value?

Answer

A

[1] A good estimate of the number of topics
[2] Manually assign a distinct nominator/‘topic’ to the different topic vector.
[3] The topic vectors will be interpretable

Question 8

Q

What is the problem in bag-of-words representation?

Answer

A

Figuring out which dimensions in the document vectors are semantically related. (solution: word embeddings)

Question 9

Q

What is word embeddings?

Answer

A

Words that occur in the same context are represented by vectors in close proximity to each other.

Question 10

Q

How to visulize word embedding space?

Answer

A

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensionality reduction method that you can use to visualise high-dimensional data in 2D.

Question 11

Q

What is CBOW in word2vec embedding?

Answer

A

In the bag-of-words architecture (CBOW) the pivot word is predicted based on a set of surrounding context words (i.e. given ‘thank’, ‘such’, ‘you’, ‘top’ the model has to predict ‘awesome’)

Question 12

Q

What is skip-gram in word2vec embedding?

Answer

A

In the skip-gram architecture, the pivot word is used to predict the surrounding context words (i.e. given ‘awesome’ predict ‘thank’, ‘such’, ‘you’, ‘top’ ).

Question 13

Q

What is bad about word2vec?

Answer

A

The word2vec model learns a word vector that predicts context words across different documents. As a result, document-specific information is mixed together in the word embeddings.

Question 14

Q

what is lda2vec?

Answer

A

Inspired by Latent Dirichlet Allocation (LDA), the word2vec model is expanded to simultaneously learn word, document and topic vectors.

Question 15

Q

what is Lda2vec process?

Answer

A

The pivot word vector and a document vector are used to obtain a context vector. This context vector is then used to predict context words.

Question 16

Q

What are the components of a document vector?

Answer

Study These Flashcards

A

A document vector is decomposed into a document weight vector and a topic matrix.

Question 17

Q

The document weight vector represents:

Answer

Study These Flashcards

A

the percentage of the different topics.

Question 18

Q

The topic matrix consists

Answer

Study These Flashcards

A

of the different topic vectors

Question 19

Q

A context vector is

Answer

Study These Flashcards

A

constructed by combining the different topic vectors that occur in a document.

Question 20

Q

What is the end-result of the lda2vec?

Answer

Study These Flashcards

A

a set of sparse document weight vectors, as well as easily interpretable topic vectors.

Question 21

Q

Lad2vec in Tensorflow:

Answer

Study These Flashcards

A

https: //github.com/meereeum/lda2vec-tf
https: //github.com/cemoody/lda2vec

Question 22

Q

Best lda2vec tutorial:

Answer

Study These Flashcards

A

https://www.datacamp.com/community/tutorials/lda2vec-topic-model

LDA2VEC Flashcards

(22 cards)