Sentiment Analysis: Sentiment and Rhetoric Flashcards

1
Q

Sentiment Analysis (SA)

A

Computational study of opinions, sentiments, and emotions express in text.
A kind of semantic analysis: feeling, emotion, judgment in language.
-Comes into play with the rise of user generated content and social media
-Reviews are the most common use case
-Current state of the art focus on feature level objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sentiment Scoring

A

Most basic is positive/negative, but doesn’t say how it’s positive or negative - “table stakes” - what’s needed to get into the game.

  • Usually scored from -1 to 1 is simplest
  • 0 can mean neutral or non-detection of polarity because the system only knows how to detect positive or negative. Sometimes it’s net over a document. Averages out. Should always find out what 0 means.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Two approaches to Sentiment

A
  • Supervised ML

- Unsupervised sentiment lexical knowledge base. What is valence score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Supervised ML

A

Apply classification to sentences or documents.

  • Binary classifier SDG using sklearn, an SVM implementation.
  • Requires training data to build the model.

Pros

  • Can be ready quickly if you have a lot of training data
  • Don’t need to develop a coded vocabulary with valence score

Cons

  • It’s opaque, not explainable (not XAI)
  • It’s only as granular as the training data

Process

  1. Establish training set
  2. Normalize text (expand contractions, spelling corrections, etc.)
  3. Extract feature vectors. Might decide to stem, but sometime with sentiment we decide not to stem. Past tense could be negative when present tense is positive. Worked fine vs. working fine. Also may not want to remove stop words.
  4. Train a binary classifier. Use SVM/SDG.
  5. After QA, decide if more training data is needed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Unsupervised sentiment lexical knowledge base

A

Biggest choice is which sentiment lexicon to use?

  • There are many out there. A lot of people use AFINN (“Affective lexicon by Finn Nielsen), 2, 477 clues or Liu’s lexicon 6, 800 clues.
  • Pick up clues and put them together to measure overall sentiment
  • MPQA (“Multi-Perspective Question Answering”) subjectivity lexicon: 8222 clues.
  • SentiWordNet: Labels all 1000k + WordNet synsets was created by a machine.
  • VADER (Valence Aware Dictionary for sEntiment Reasoning): 7500. Rule-based framework built for social media. Scores for words, emoticons, slang.
  • Pattern library lexicon: 3000 clues but mostly adjectives, handcoded with Valences. It’s great when words are mapped to WordNet. Generally works pretty well. Specializes in the area of mood.
  • Custom lexicon: as many clues as you want. This is the best because every domain is different and custom.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Does size matter with unsupervised sentiment kbs?

A

Some of the smaller ones outperform larger ones so not necessarily. It matter if the domain is similar or how carefully they are constructed.

Pros

  • Does not require training data
  • Very explainable (XAI)

Cons

  • Needs a coded vocabulary (lexical KB)
  • Can be cumbersome to maintain in the face of new tropes (words or figures of speech)

Process

  1. Establish valence-weighted vocabularies
  2. Normalize text.
  3. Extract feature vectors.
  4. Execute a scoring algorithm. Essentially adding sentiment in each chunk of text.
  5. QA, tweak vocabulary and rerun until it passes. Might have to adjust weights.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

More advanced techniques (hard)

A
  • Determining referents and/oro topics to which sentiment attaches
  • Classifying into more categories than positive/negative
  • Picking up on non-sentiment vocabulary differences that align with sentiment around a topic - rhetoric analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Straightforward approach with a chunker

A

Run a chunker and send NP-chunks and VP-chunks instead of sentences into a sentiment analyzer. Then you can presume that the main noun in a noun phrase is the object of the sentiment. This will be correct a lot of time. Doesn’t work well for negation or double negation. Part of text normalization to rip apart/transform negation so that not unappealing to appealing. Nullifier handling, “hardly”, etc. But don’t be too worried with big enough numbers might not need to worry about negation (just ignore it). Directionally correct result.

Another vulnerability
-Some sentiment attached outside the NP-chunk instead of within it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Dependency parser

A
  • Run a dependency parser
  • Follow dependency paths from a sentiment trigger until an object is found
  • Most of the time gets to the target of the sentiment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hierarchical Sentiment Scoring

A

Dimensionality of sentiment: What kind of positive or negative sentiment?
Build a taxonomy of different types of sentiment. At the top positive/negative and then break each of those down.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Typologies of Emotion

A
William James in psychology- tried to break into 4 emotions (1890)
Watt Smith (2015): 154 emotions
As time goes by there are more emotions
Sentiment is broader than emotion
Shaver has ~135 but in a hierarchy of 6.
Positive:  Love, joy, surprise
Negative emotions:  anger, sadness, fear
2nd level has 25 or 30 emotions
Can build another level until all 135 are used
-Ready made starter vocabulary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hybrid Approach (recommended approach)

A

Semiautomated feature engineering and sentiment lexicography

  • Way to bootstrap to help manually edited lexicons
  • Differential frequency analysis - looking at negative sentiment vs. positive sentiment
  • User reviews as training data (4 or 5 good - positive, 1 or 2 bad - negative, ignore 3s because of ambiguity)
  • Lexicographer is engineering features when putting them into the lexicon. We can semi-automate with a bootstrap, extract features differential frequency as candidate clues. Then hand it over to a person. Could have the person stick a label on it while reviewing.
  • Machine could automatically suggest a weight.
  • Lexicographer can assign dimensions to these things. Not a blank slate.
  • Maintains explainable AI because we can point back to vocabulary that was built.
  • Save time from building a custom lexicon. A lot less manual labor than if it was 100% manual.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sentiment and Insight

A

Actionable Insight: What did consumers hate about the product? Need to give a user friendly presentation that non-technical people can understand.

Product
-Pull out themes, pull out sentences and highlight the trigger that matches the clue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Rhetoric and Sentiment

A

Words that don’t necessarily show up in a sentiment lexicon, but have certain connotations.
“pro-life” vs. “pro-choice”
“second amendment” vs. “assault weapons
“Eastwood” vs. “Mr. Eastwood”

Map and correlate these to sentiments. For example people who like American Sniper referred to Clint as “Mr. Eastwood” instead of “Eastwood.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Opinion

A

A quintuple that has a target object, a feature of an object, sentiment of the opinion holder on feature or the object, who is giving the opinion, the time the opinion is expressed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

SA requires

A

Named Entity Extraction, Information Extraction, Sentiment determination, Information/Data extraction

17
Q

Facts can have sentiment

A

“The phone broke in two days”