unsupervised learning Flashcards
Unsupervised methods
there is NO specific target variable
1. affinity grouping: associations, market-basket analysis: which items are commonly purchased together?
2. similarity matching: which other companies are similar to ours?
3. clustering: Do my customers form natural groups: certain groups behave a certain way?
4. sentiment analysis: what is the sentiment of my users
Supervised methods:
there is a specific target variable
1. Predictive modelling
2. causal modeling
clustering vs classification
Clustering: finding groups in data, organize data into groups: high similarity within each group
low similarity across the groups: to organize the info
classification: attempts to predict which of a small set of data the individual belongs to
methods to measure distance
- euclidean distance: data points with numeric attributes
a physical distance between two data points - manhattan distance: e.g. map walking distance
- jaccard distance: treats two objects as set of characteristics. useful when dealing with problems that involve large sets of characteristics that may not be “symmetrically” important. e.g. text mining
- cosine distance: encounter in the text mining or recommendation engines
- levenshtein metrics (edit distance): text mining. Applications: autocorrect