6 - Supervised learning, classification Flashcards
Customer Relationship Analytics
Sentiment analysis
Sentiment analysis can offer direct insights on the firm’s relationship to (potential) customers
Also called: opinion analysis
What is new:
- automated analysis
- of large amounts of text data
- collected over the web
- with objective predictive power (risk of bias)
Where can I find data on sentiments?
Traditional: surveys, interviews, written complaints
- Advantages: sentiment expressions are somewhat structured or at least targeted
- Disadvantages: limited responses, possibly sugar-coated, possibly too negative
New: “Harvesting” social networks
- Advantage: customers freely express their sentiments in large quantities
- Disadvantages: expressions are not structured, difficult to control, probably still biased
Excursus: Collecting Data from a population
Population
The complete set of elements we study
- also called: sampling frame
- e.g. all people in the UK, all BMW cars, all patients going through a medical treatment, …
Excursus: Collecting Data from a population
Census
Collecting data from every population member
- expensive and lengthy
- in the US: every 10 years, every household is contacted
Excursus: Collecting Data from a population
Referendum
Collection of data form every population member on a voluntary basis
- expensive
- e.g.: 2016: “Should the UK remain a member of the EU or leave?”
Collecting data from sample
Sample
Subset of members selected from a population
- Exhibits characteristics typical of those possessed by the population of interest
- data is collected from the sample with the objective to analyse and make inferences about the population
- sample must be “well-selected” to “well represent” the population (we are always interested in the population)
Bias
Whats a bias?
Also: tendency, prejudice
can be positive or negative
Bias
Biased sample
- measurements, observations or responses are likely to be unrepresentative of the population as a whole, because of the way the sample is chosen
- e.g. test Whiskas only on cats from Whiskas employees
Bias
Analysis of a biased sample
- results may be very misleading
- a severely biased sample is usually worthless
Bias
Biased sample
What does that mean for sentiment analysis?
- different channels can offer different messages
- people are more likely to say bad things
Web scraping: extracting sentiments from online sources
- still includes manual efforts to varying extends
- can be automated via a variety of tools or code in Python or Scrapy
- efficiency depends on the fit of the search pattern
- ethically and legal “grey” area: customers do not direct their expressions toward the company, but toward an audience of peers
Approaches to sentiment analysis
Which perspective should be taken?
needs to be defined beforehand
Focus
- domain
- document or sentence
- context
- aspect or holistic
Question
- quality
- reason
- intensity
- subjectivity
- seriosity (e.g. irony)
- originality (customer vs. competitor)
What is an opinion?
opinion = (entity, aspect, sentiment, owner, date)
- aspects can be dresses directly or indirectly
- sentiments can be stated in a subjective or objective way
- the statement’s author is not always the sentiment’s owner
- date information can be used to trace trends