Lecture 7 Flashcards
Old methods, new data - challenges
› In general, modeling social influence is complex
› Observations are not independent
› What is the relevant network of a consumer?
Text analysis - Two approaches
› Information directly observed (≈ counting) Counting words
- # verbs, nouns etc
- # positive and negative words - wordcloud
› Information latent (≈ intelligence)
- Groups of words/sentences that relate to a certain latent topic.
Latent Dirichlet Allocation (LDA)
“Latent topics are defined by a collection of words with a relatively high probability of usage and not from the prevalence or significance of single words”
LDA assumptions
› Assumptions:
each document is characterized by a mixture of topics.
each topic is characterized by a discrete probability distribution over words.
› Think of a dictionary of all words in all documents.
› Each topic is a unique set of probabilities of potential word use.
› Words that are likely to occur ‘in a topic’ are used to label/identify the topics.
Buschken and Allenby 2016
Bag of sentences instead of bag of words.
› Piece of text typically contains multiple topics.
› But, a single sentence typically pertains to one
topic.