2. Multivariate Data Analysis Flashcards
types of multivariate statistical techniques
regression analysis
factor analysis
cluster analysis
discriminant analysis
canonical correlation analysis
structural equation modeling
cluster analysis
used to group data points with similar characteristics together
= segment a population
common uses of clustering in marketing
customer segmentation
product recommendation
market analysis
= targeted campaigns
= more effective marketing effo rts + increased customer satisfaction
hierarchical clustering
popular clustering algorithm that groups similar data points together in a hierarchical structure
Ward’s method
type of linkage criteria used to determine the similarities between clusters when merging them
- agglomerative (bottom-up approach)
- attempts to create more evenly sized clusters
Squared Euclidean Distance
distance metric to determine the similarity between data points or clusters
- squared in order to place greater weight on objects that are farther apart
factor analysis
used to explore the underlying structure of a set of variables
- by identifying groups of variables that are highly correlated and have a shared variance
= simplify data + identify key drivers of consumer behavior
uses of factor analysis in marketing
used to identify consumer preferences and brand perceptions
when is factor analysis valid?
when it passes both:
1. Kaiser-Meyer-Olkin (KMO) Test
2. Bartlett’s Test of Sphericity
Bartlett’s Test of Sphericity
used to determine if data is appropriate for factor analysis
- it tests the null hypothesis (assumes 0 correlation) - tests if the correlation matrix is the same as the identity matrix
Bartlett’s Test of Sphericity - interpretation
p-value at 0.00 (insignificant)
- smaller than significance level (0.05 or 0.001)
- null hypothesis can be rejected
- items correlate w/ each other
Kaiser-Meyer-Olkin (KMO) Test
measure of sampling adequacy
- shows to which extent SPSS can find underlying dimensions
- indicates how much of the volatility in the variables may be attributable to underlying causes (if correlation is sufficiently high)
Kaiser-Meyer-Olkin (KMO) Test - interpretation
values range from 0 - 1
- pass if >0,5
- should be above 0,6 for a good fit
why did we use the Squared Euclidean distance?
data barrier: only 2 segments instead of 4
- two metrics were symmetrical so they cancelled themselves out = merging segments
square euclidean: turn values into +ve outcomes for them not to nulify by being symmetrical
Chat GPT results survey
study showed:
- very high level for WillUseUni
- generally positive attitude towards using it (except 2)
(correlation b/w finding unethical w/ higher willuseuni)