Lecture 1: Introduction to Causal Inference Flashcards
Basic tenant for Marketing as a Research Field
- Generate value
- Inform customer-oriented decision-making
- Appropriate deployment of marketing resources
3 V’s Big Data
- Volume
- Velocity
- Variety
Expanded list of Vs
→ Veracity: accuracy and truthfulness of the data
→ Variability: inconsistency (e.g., outlier, anomaly)
→ Visualization: ways to visualize data
→ Value: business value of the data
→ Volatility: storage and retrieval of data
Big data
- Digitalization
- Consumer behaviors
- Root cause of big data—digitalization—is more informative than its physical size or complexity
Problem recognition
- Dissatisfaction
- Valuable information
- Transaction pattern
- e-word-of-mouth
- Dialogue, shopping, and use behaviours (customer behaviour)
Shopping cart abandonment
Price and product comparison at other sites, customers do not check out their shopping cart
Collaborative filtering
Recommendation based on other users with similar profiles -> User-based and Item-based filtering
Query variation index
- Identification of a user’s information need
- Keyword performance
Post consumption evaluation
- Reviews
- Satisfaction
- Veracity, Volume, and Variance
Issues with Big data
- Descriptive (fail to understand why)
- Difficult to manage, store, and ensure quality
- Big data ≠ good data
- Susceptible to various biases
- Solution: compliment Big data with traditional research methods
Definition Causality
- Changes in X leads to changes in Y while keeping
everything else constant - An explanation; a focus on the ‘why’ question
Why should we care? (Causality)
- One must explain and use the information
- Most business decision involve counterfactual reasoning
- Increase the credibility of your argument with data and statistics
Why is drawing causal inference so hard?
- If individual i is assigned to the treatment group t then Yi,c is not observable
- If individual i is assigned to the control group then Yi,t is not observable
- Potential outcomes: outcome in the non-received treatment group
- Ideal scenario: The existence of a parallel universe with a difference only in the treatment = Goal of most causal inference methods
Threats to classical assumptions in regression
- Omitted variables
- Measurement Error
- Sample selection bias
- Misspecification or Wrong Functional Form
- Simultaneous causality
Omitted variables
- Determinants of the outcome variable is omitted
- Omitted variable must be: A determinant of the outcome variable Y and Correlated with regressor X but unobserved
Measurement Error
- Administrative process
- Recollection of memory
- Ambiguous questions
- False response
Sample selection bias
When a selection process influences the availability of data and that process is related to the dependent variable (focuses only on the customers with highest spending)
Misspecification or Wrong Functional Form
- Polynomial term is incorrectly omitted
- Irrelevant variable included
- Transform a non-linear variable
Simultaneous causality
- Reverse causality
- Confounding variables
How to assess if a method is able to draw causality?
- Temporal sequencing
- Non-spurious relationship
- Eliminate alternate causes
Temporal sequencing
Independent variable should occur before the dependent variable
Non-spurious relationship
Effect on the dependent variable should be caused by the independent variable
Eliminate alternate causes
No other (confounding) variable
An explanation; a focus on the ‘why’ question
- Continuous model evaluation
- Improve transparency and explainability
- Increase trust and credibility
- Improve compliance to regulation
- Minimize risk of bias and discrimination
- Complement ML and AI (explainable and responsible AI)