Sentiment Analysis: Sentiment and Rhetoric Flashcards
Sentiment Analysis (SA)
Computational study of opinions, sentiments, and emotions express in text.
A kind of semantic analysis: feeling, emotion, judgment in language.
-Comes into play with the rise of user generated content and social media
-Reviews are the most common use case
-Current state of the art focus on feature level objects
Sentiment Scoring
Most basic is positive/negative, but doesn’t say how it’s positive or negative - “table stakes” - what’s needed to get into the game.
- Usually scored from -1 to 1 is simplest
- 0 can mean neutral or non-detection of polarity because the system only knows how to detect positive or negative. Sometimes it’s net over a document. Averages out. Should always find out what 0 means.
Two approaches to Sentiment
- Supervised ML
- Unsupervised sentiment lexical knowledge base. What is valence score.
Supervised ML
Apply classification to sentences or documents.
- Binary classifier SDG using sklearn, an SVM implementation.
- Requires training data to build the model.
Pros
- Can be ready quickly if you have a lot of training data
- Don’t need to develop a coded vocabulary with valence score
Cons
- It’s opaque, not explainable (not XAI)
- It’s only as granular as the training data
Process
- Establish training set
- Normalize text (expand contractions, spelling corrections, etc.)
- Extract feature vectors. Might decide to stem, but sometime with sentiment we decide not to stem. Past tense could be negative when present tense is positive. Worked fine vs. working fine. Also may not want to remove stop words.
- Train a binary classifier. Use SVM/SDG.
- After QA, decide if more training data is needed.
Unsupervised sentiment lexical knowledge base
Biggest choice is which sentiment lexicon to use?
- There are many out there. A lot of people use AFINN (“Affective lexicon by Finn Nielsen), 2, 477 clues or Liu’s lexicon 6, 800 clues.
- Pick up clues and put them together to measure overall sentiment
- MPQA (“Multi-Perspective Question Answering”) subjectivity lexicon: 8222 clues.
- SentiWordNet: Labels all 1000k + WordNet synsets was created by a machine.
- VADER (Valence Aware Dictionary for sEntiment Reasoning): 7500. Rule-based framework built for social media. Scores for words, emoticons, slang.
- Pattern library lexicon: 3000 clues but mostly adjectives, handcoded with Valences. It’s great when words are mapped to WordNet. Generally works pretty well. Specializes in the area of mood.
- Custom lexicon: as many clues as you want. This is the best because every domain is different and custom.
Does size matter with unsupervised sentiment kbs?
Some of the smaller ones outperform larger ones so not necessarily. It matter if the domain is similar or how carefully they are constructed.
Pros
- Does not require training data
- Very explainable (XAI)
Cons
- Needs a coded vocabulary (lexical KB)
- Can be cumbersome to maintain in the face of new tropes (words or figures of speech)
Process
- Establish valence-weighted vocabularies
- Normalize text.
- Extract feature vectors.
- Execute a scoring algorithm. Essentially adding sentiment in each chunk of text.
- QA, tweak vocabulary and rerun until it passes. Might have to adjust weights.
More advanced techniques (hard)
- Determining referents and/oro topics to which sentiment attaches
- Classifying into more categories than positive/negative
- Picking up on non-sentiment vocabulary differences that align with sentiment around a topic - rhetoric analysis
Straightforward approach with a chunker
Run a chunker and send NP-chunks and VP-chunks instead of sentences into a sentiment analyzer. Then you can presume that the main noun in a noun phrase is the object of the sentiment. This will be correct a lot of time. Doesn’t work well for negation or double negation. Part of text normalization to rip apart/transform negation so that not unappealing to appealing. Nullifier handling, “hardly”, etc. But don’t be too worried with big enough numbers might not need to worry about negation (just ignore it). Directionally correct result.
Another vulnerability
-Some sentiment attached outside the NP-chunk instead of within it
Dependency parser
- Run a dependency parser
- Follow dependency paths from a sentiment trigger until an object is found
- Most of the time gets to the target of the sentiment
Hierarchical Sentiment Scoring
Dimensionality of sentiment: What kind of positive or negative sentiment?
Build a taxonomy of different types of sentiment. At the top positive/negative and then break each of those down.
Typologies of Emotion
William James in psychology- tried to break into 4 emotions (1890) Watt Smith (2015): 154 emotions As time goes by there are more emotions Sentiment is broader than emotion Shaver has ~135 but in a hierarchy of 6. Positive: Love, joy, surprise Negative emotions: anger, sadness, fear 2nd level has 25 or 30 emotions Can build another level until all 135 are used -Ready made starter vocabulary
Hybrid Approach (recommended approach)
Semiautomated feature engineering and sentiment lexicography
- Way to bootstrap to help manually edited lexicons
- Differential frequency analysis - looking at negative sentiment vs. positive sentiment
- User reviews as training data (4 or 5 good - positive, 1 or 2 bad - negative, ignore 3s because of ambiguity)
- Lexicographer is engineering features when putting them into the lexicon. We can semi-automate with a bootstrap, extract features differential frequency as candidate clues. Then hand it over to a person. Could have the person stick a label on it while reviewing.
- Machine could automatically suggest a weight.
- Lexicographer can assign dimensions to these things. Not a blank slate.
- Maintains explainable AI because we can point back to vocabulary that was built.
- Save time from building a custom lexicon. A lot less manual labor than if it was 100% manual.
Sentiment and Insight
Actionable Insight: What did consumers hate about the product? Need to give a user friendly presentation that non-technical people can understand.
Product
-Pull out themes, pull out sentences and highlight the trigger that matches the clue
Rhetoric and Sentiment
Words that don’t necessarily show up in a sentiment lexicon, but have certain connotations.
“pro-life” vs. “pro-choice”
“second amendment” vs. “assault weapons
“Eastwood” vs. “Mr. Eastwood”
Map and correlate these to sentiments. For example people who like American Sniper referred to Clint as “Mr. Eastwood” instead of “Eastwood.”
Opinion
A quintuple that has a target object, a feature of an object, sentiment of the opinion holder on feature or the object, who is giving the opinion, the time the opinion is expressed.