C8 Flashcards

1
Q

topic modelling

A

assumptions: each document consists of a mixture of topics and each topic consists of a mixture of words

unsupervised technique:
- topic labels are not given
- number of topics needs to be pre-specified
- think about it as clustering

most used technique: LDA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

generative probablistic model LDA

A

Latent Dirichlet Allocation

Topic = probability distribution over fixed vocabulary. Every topic contains a probability for every word in the vocabulary

Each document is a distribution over topics (Dirichlet distribution, prior set on this distribution is sparse)

Generate a document as a bag of words:
- draw a topic from the distribitution (e.g. yellow), lookup the yellow distribution, and draw a word from the yellow distribution etc.
- order of words doesn’t matter; the words are drawn independently of each other

We only observe the words in the documents. The topics are latent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

goal of LDA

A

infer the underlying topic structure by only observing the documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

LDA: learn distributions

A
  1. What are the topics, what are the distributions over terms?
  2. For each document, what is the distribution over topics?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

LDA: learn the topics from the data

A

Goal: to learn β (the topic models) and theta (topic proportion per document)
- we only observe the words W
- start with random probability distributions of words in topics and of topics in documents
- update the probability distributions while observing the words in the documents (Bayesian framework)
- until β converges, or the maximum number of epochs has been reached

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

LDA: challenges

A
  • choose the number of topics
  • random initialization of the clustering => non-deterministic outcome
  • interpreting the outputs: what do the topics mean?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

evaluation of topic modelling

A
  • topic coherence: measure similarity of words inside a topic and between topics
  • human evaluation: word intrusion = given these 5 high-probability topic words + 1 random word, can you find the intruder?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

single-document summarization examples

A
  • news articles
  • scientific articles
  • meeting reports
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

multi-document summarization examples

A
  • output of a search engine
  • news about a single topic from multiple sources
  • discussion threads summarization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

extractive summarization

A

a summary composed completely of material from the source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

abstractive summarization

A

a summary that contains material not originally in the source, but shorter paraphrases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

describe the extractive summarization method and its pros and cons

A

Select the most important nuggets (sentences)

Classification or ranking task:
- classification: for each sentence, select it: yes/no
- ranking: assign a score to each sentence, then select the
top-k

+ feasible / easy to implement
+ reliable (literal re-use of text)
- but limited in terms of fluency
- (fixes required after sentence selection)

strong baseline: take first three sentences from the document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

extractive summarization: sentence selection methods

A

Unsupervised methods:
- centrality-based
- (graph-based)

Supervised methods:
- feature-based
- (embeddings based)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

unsupervised sentence selection (centrality-based)

A
  1. measure the cosine similarity between each sentence and the document (use either sparse or dense vectors)
  2. Select the sentences with the highest similarity (the most representative sentences)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

supervised sentence selection

A

feature engineering + classifier (eg. SVM)
features: position in the document, word count, word lengths, word frequencies, punctuation, representativeness (similarity to full document/title)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

problems with sentence selection

A
  • selecting sentences that contain unresolved references to sentences not included in the summary or not explicitly included in the original document
  • improvements might be needed after sentence selection (sentence ordering, revision, fusion, compression)
17
Q

abstractive summarization: method and pros/cons

A

Learn a text-to-text-transformation model (cf. translation)
- training data: pairs of longer and shorter texts, eg. for scientific documents: full article and abstract , editor-written summaries of comment threads (NY Times)
- sequence-to-sequence models: learning a mapping between an input sequence and an output sequence

+ more natural/fluent result
- but a lot of training data needed
- and risk of untrue content

18
Q

Pegasus

A

encoder-decoder pre-training for abstractive summarization

pre-training objectives (self-supervised):
1. Masked Language Modelling (like BERT)
2. Gap Sentences Generation (GSG)

motivation:
- large-scale document-summary datasets (for supervised learning) are rare
- creating training data is expensive (‘low-resource summarization’)

19
Q

challenges in summarization

A
  • Factual consistency (for abstractive summarization)
  • Task subjectivity/ambiguity
  • Training data bias
  • Evaluation
20
Q

factual consistency

A

main challenge of abstractive summarization models

research showed that the majority of generated summaries contained non-faithful content => human judgement is still crucial for this kind of evaluation as automatic metrics do not strongly correlate with summary faithfulness

21
Q

training data bias

A

Most used sets for training and evaluation summarization models are based on news data

In newspaper articles, the most important information is in the first paragraph but in other domains this does not always apply

22
Q

evaluation of summarization

A
  • compare to reference summaries
  • ask human juges
23
Q

compare to reference summaries

A

compute overlap with human reference summary

ROUGE metrics: measures quality of a summary by comparison with reference summaries (literal)

24
Q

ROUGE

A

the proportion of n-grams from the reference summaries that
occur in the automatically created summary (‘recall-oriented’)
ROUGE-N = # n−grams in automatic AND reference summary / # n−grams in reference summary

(also count beginning and end-of markers)

25
Q

rating criteria for summary jugdement by humans

A

rating criteria:
- relevance/importance: selection of important content from the source
- consistency: factual alignment between the summary and the source
- fluency: quality of individual sentences
- coherence: collective quality of all sentences

ask multiple judges per summary

26
Q

challenges in evaluation (abstractive summarization)

A
  • ROUGE often has weak correlation with human judgements
  • but human judgments for relevance (importance) and fluency are strongly correlated to each other