lecture 4 Flashcards

1
Q

sources of bias

A
  1. selection phase (influences data)
  2. annotation (influences data)
  3. input representation: how language is encoded and fed to models
  4. models
  5. research design
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

importance of data

A
  1. datasets form the basis of model training, evaluating, and benchmarking
  2. the ways in which we collect/construct/share these datasets inform the kinds of problems the field pursues and the methods explored in algorithm development
  3. good quality data ensures models perform well, are fair, and can be generalized across various contexts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

text classification

A

corpora help us with text classification

goal: assign a label or category to a specific piece of text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

why use text classification

A
  1. categorize language at word, sentence, and document level
  2. predict future outcomes
  3. find patterns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

sentiment analysis

A

goal: predict the sentiment expressed in a piece of text (+, - , scale rating)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

why is sentiment analysis hard

A
  1. sentiment is a measure of a speaker’s private state, which is unobservable
  2. sometimes words are a good indicator of sentiment, but many times it requires deep world + contextual knowledge
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

other text classification problems

A
  1. language identification: which language the text is in
  2. spam classificiation
  3. authorship attribution
  4. genre classificiation
  5. senitment analysis: understanding public opinion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

questions when building a sentiment classifier

A
  1. what is the input for each prediction (e.g., sentence, text, etc.)
    –> requires substantial data
  2. what are the possible outputs (e.g., +, -, scale)
  3. how will the model decide (model decision mechanism)
  4. how to measure effectiveness (evaluation metrics)
    –> requires substantial data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

data-driven evaluation

A

choose a dataset for evaluation before you build a system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

why is data-driven evaluation important

A
  1. controlled experimentation
  2. benchmarks: serve as reference points to evaluate the performance of a system
  3. your intuitions about inputs are probably wrong
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

where to get a corpus

A
  1. many corpora are prepared specifically for linguistic/NLP research with text from providers
  2. collect a new one by scraping websites
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

gold labels

A

annotations used to evaluate and compare sentiment analyzers

these can be
1. derived automatically from the original data artifact (metadata such as starratings)
2. added by human annotator who reads the text (but how to address trouble with deciding and agreeing between annotators)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

sentiment analysis training data

A

(X,Y) pairs to learn h(X)
–> (input, output)
–> relies heavily on accurately labeled data
–> this is text classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

accuracy

A
  • #correct / #total
  • simplest measure
  • not a good measure when there are class imbalances: when a classifier always predicts the majority class it will seem accurate but is ineffective in reality
  • doesnt show the quality of predictions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

confusion matrix

A
  • gives more detailed insight into classification
  • used for precision, recall, F1 score
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

precision

A
  • accuracy of positive predictions (how often is my prediction correct)
  • TP/ (TP + FP)
  • measure of quality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

precise model

A

might not find all positives, but the ones that the model does classify as positive are very likely to be correct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

not precise model

A

may find a lot of positives, but its selection method is noisy. it wrongly detects many positives that arent true positives.

19
Q

recall (sensitivity)

A
  • how well are we capturing positive instances (how many of the positive instances do i find)
  • TP / (TP + FN)
  • measure of quantity
20
Q

model with high recall

A

succeeds in finding all positive cases, even though it might also wrongly identify some negative cases as positive cases

21
Q

model with low recall

A

not able to find all or a large part of positive cases

22
Q

when to use precision vs recall

A
  • precision: when we prioritize the quality of positive predictions over finding all positive instances
  • recall: when the aim is to capture all positive cases, even if it leads to some false positives
23
Q

tuning for high precision

A

the system should not make a mistake

24
Q

tuning for high recall

A

the system should not miss a case

25
Q

F1 measure

A
  • balance between precision and recall (harmonic mean)
  • offers better insight about model performance based on quality
  • especially important for class imbalance
  • 2(precisionrecall)/(precision+recall)
26
Q

F1 score

A

1: high: both P and R are high
2. low: both P and R are low
3. medium: one of P and R is low and the other is high

27
Q

random baseline

A
  • method to provide a reference point for evaluating classification model performance
  • labels are assigned to observations at random
  1. fix random seed
  2. repeat n times
  3. average results
  • serves as benchmark against which better models are evaluated
28
Q

majority baseline

A
  • assign most frequent class label to all instances, calculate results
  • results in high accuracy when one class significantly outweighs the other, but poor performance in identifying the majority class
  • ensures that models not only have high overall accuracy, but can also correctly identify less frequent classes
29
Q

evaluation for multiple classes

A
  1. calculate precision and recall for every class separately
  2. average the results over classes
    –> macro average: does not take class imbalance into account
    –> weighted average: weighted by class size
30
Q

sentiment lexicon

A
  1. predefined list of words classified as positive/negative
  2. count positive and negative words within the text. predict whichever is greater.
31
Q

problems with sentiment lexicon

A
  1. hard to know if words that seem pos/neg are actually used that way
  2. opinion words might describe a characters attitude rather than an evaluation of the film
  3. some words are semantic modifiers
32
Q

solutions for sentiment lexicon problems

A

data-driven method: use frequency counts to ascertain which words in corpora tend to be positive or negative

33
Q

h(x)

A
  • for text classification
  • a mapping h from input data x to a label y
  • two components
    1. representation of the data
    2. formal structure of the learning method
34
Q

representation of data for text classification

A
  1. sentiment analysis: only positive and negative words
  2. only words in isolation (BoW)
  3. conjunctions of words (sequential, ngrams, other nonlinear combinations)
  4. higher order linguistic structure (syntax)
35
Q

bag of words

A
  • simplest representation
  • text represented as counts of words that it contains
  • frequency of occurrence of each word is used as a feature for training a classifier
36
Q

BoW process

A
  1. tokenize
  2. count
  3. vectorize: each dimension represents a unique word in the entire corpus, and the value in each dimension is the word’s frequency in the document
37
Q

why is BoW not sufficient for modeling language

A
  1. insensitive to word order or semantics
  2. vectors are sparse and high-dimensional
  3. ‘words’ are not always the most meaningful units of information
38
Q

ngrams

A
  • assign probabilities to sentences
  • looking at more than one word at a time
  • estimate P(S = w1…wn). this is a joint probability over all the words in S.
39
Q

ngrams: chain rule

A

P(S = w1…wn) = product of conditional probabilities

40
Q

problem with chain rule + solution

A
  • problem: many conditional probabilities are just as sparse due to the vast number of possible word combinations
  • solution: independence assumption. the probability of a word only depends on a fixed number of previous words (history)

P(mast| i spent three years before the mast) = P(mast|before,the)

41
Q

ngrams usefulness for sentiment analysis

A

ngrams capture sentiment beyond the word level, since they have more context awareness

42
Q

why do we need text corpora

A
  1. to evaluate our systems
    - good science requires controlled experimentation
    - good engineering requires benchmarks
  2. to help our systems work well
    - data-driven methods instead of rule-based
    - learning
43
Q

learning

A

collecting statistics or patterns from corpora to govern the system’s behavior

  1. supervised learning
  2. core behavior: training, refining behavior: tuning.