- choose the number of topics - random initialization of the clustering => non-deterministic outcome - interpreting the outputs: what do the topics mean?

C8 Flashcards by Emilia van der Kooy

topic modelling

assumptions: each document consists of a mixture of topics and each topic consists of a mixture of words

unsupervised technique:
- topic labels are not given
- number of topics needs to be pre-specified
- think about it as clustering

most used technique: LDA

How well did you know this?

Not at all

Perfectly

generative probablistic model LDA

Latent Dirichlet Allocation

Topic = probability distribution over fixed vocabulary. Every topic contains a probability for every word in the vocabulary

Each document is a distribution over topics (Dirichlet distribution, prior set on this distribution is sparse)

Generate a document as a bag of words:
- draw a topic from the distribitution (e.g. yellow), lookup the yellow distribution, and draw a word from the yellow distribution etc.
- order of words doesn’t matter; the words are drawn independently of each other

We only observe the words in the documents. The topics are latent

How well did you know this?

Not at all

Perfectly

goal of LDA

infer the underlying topic structure by only observing the documents

How well did you know this?

Not at all

Perfectly

LDA: learn distributions

What are the topics, what are the distributions over terms?
For each document, what is the distribution over topics?

How well did you know this?

Not at all

Perfectly

LDA: learn the topics from the data

Goal: to learn β (the topic models) and theta (topic proportion per document)
- we only observe the words W
- start with random probability distributions of words in topics and of topics in documents
- update the probability distributions while observing the words in the documents (Bayesian framework)
- until β converges, or the maximum number of epochs has been reached

How well did you know this?

Not at all

Perfectly

LDA: challenges

choose the number of topics
random initialization of the clustering => non-deterministic outcome
interpreting the outputs: what do the topics mean?

How well did you know this?

Not at all

Perfectly

evaluation of topic modelling

topic coherence: measure similarity of words inside a topic and between topics
human evaluation: word intrusion = given these 5 high-probability topic words + 1 random word, can you find the intruder?

How well did you know this?

Not at all

Perfectly

single-document summarization examples

news articles
scientific articles
meeting reports

How well did you know this?

Not at all

Perfectly

multi-document summarization examples

output of a search engine
news about a single topic from multiple sources
discussion threads summarization

How well did you know this?

Not at all

Perfectly

extractive summarization

a summary composed completely of material from the source

How well did you know this?

Not at all

Perfectly

abstractive summarization

a summary that contains material not originally in the source, but shorter paraphrases

How well did you know this?

Not at all

Perfectly

describe the extractive summarization method and its pros and cons

Select the most important nuggets (sentences)

Classification or ranking task:
- classification: for each sentence, select it: yes/no
- ranking: assign a score to each sentence, then select the
top-k

+ feasible / easy to implement
+ reliable (literal re-use of text)
- but limited in terms of fluency
- (fixes required after sentence selection)

strong baseline: take first three sentences from the document

How well did you know this?

Not at all

Perfectly

extractive summarization: sentence selection methods

Unsupervised methods:
- centrality-based
- (graph-based)

Supervised methods:
- feature-based
- (embeddings based)

How well did you know this?

Not at all

Perfectly

unsupervised sentence selection (centrality-based)

measure the cosine similarity between each sentence and the document (use either sparse or dense vectors)
Select the sentences with the highest similarity (the most representative sentences)

How well did you know this?

Not at all

Perfectly

supervised sentence selection

feature engineering + classifier (eg. SVM)
features: position in the document, word count, word lengths, word frequencies, punctuation, representativeness (similarity to full document/title)

How well did you know this?

Not at all

Perfectly

problems with sentence selection

Study These Flashcards

selecting sentences that contain unresolved references to sentences not included in the summary or not explicitly included in the original document
improvements might be needed after sentence selection (sentence ordering, revision, fusion, compression)

abstractive summarization: method and pros/cons

Study These Flashcards

Learn a text-to-text-transformation model (cf. translation)
- training data: pairs of longer and shorter texts, eg. for scientific documents: full article and abstract , editor-written summaries of comment threads (NY Times)
- sequence-to-sequence models: learning a mapping between an input sequence and an output sequence

+ more natural/fluent result
- but a lot of training data needed
- and risk of untrue content

Pegasus

Study These Flashcards

encoder-decoder pre-training for abstractive summarization

pre-training objectives (self-supervised):
1. Masked Language Modelling (like BERT)
2. Gap Sentences Generation (GSG)

motivation:
- large-scale document-summary datasets (for supervised learning) are rare
- creating training data is expensive (‘low-resource summarization’)

challenges in summarization

Study These Flashcards

Factual consistency (for abstractive summarization)
Task subjectivity/ambiguity
Training data bias
Evaluation

factual consistency

Study These Flashcards

main challenge of abstractive summarization models

research showed that the majority of generated summaries contained non-faithful content => human judgement is still crucial for this kind of evaluation as automatic metrics do not strongly correlate with summary faithfulness

training data bias

Study These Flashcards

Most used sets for training and evaluation summarization models are based on news data

In newspaper articles, the most important information is in the first paragraph but in other domains this does not always apply

evaluation of summarization

Study These Flashcards

compare to reference summaries
ask human juges

compare to reference summaries

Study These Flashcards

compute overlap with human reference summary

ROUGE metrics: measures quality of a summary by comparison with reference summaries (literal)

ROUGE

Study These Flashcards

the proportion of n-grams from the reference summaries that
occur in the automatically created summary (‘recall-oriented’)
ROUGE-N = # n−grams in automatic AND reference summary / # n−grams in reference summary

(also count beginning and end-of markers)

rating criteria for summary jugdement by humans

rating criteria: - relevance/importance: selection of important content from the source - consistency: factual alignment between the summary and the source - fluency: quality of individual sentences - coherence: collective quality of all sentences ask multiple judges per summary

challenges in evaluation (abstractive summarization)

- ROUGE often has weak correlation with human judgements - but human judgments for relevance (importance) and fluency are strongly correlated to each other

C8 Flashcards

(26 cards)