Lecture 9 - Unsupervised Learning II Flashcards
What is anomaly detection?
The detection of rare outliers which differ significantly from the rest of the data, can indicate medical problems and mechanical failures for example
What are point anomalies?
Single points that represent small or big values
—–/—-
What are contextual anomalies?
Unexpected sequences given history
/\/\/\/\___/\/\
What are collective anomalies?
Neither single points nor patterns look strange, but in global sense something looks off
For example all 3 lines drop at the same time
What are concept drifts?
A slow and steady drift to some new state
For example slowly climbing upwards
What is a change point detection?
a strange step, where a shift occurs, for example first high line then sudden drop
———-_________
What is supervised anomaly detection?
Labeled data for normal and abnormal cases, unbalanced data makes this difficult approach
What is semi-supervised anomaly detection?
Based on a model of normal behaviour which analyses the probability of a data point being within the normal range
What is unsupervised anomaly detection?
Most widely used approach, no labeled data
What is deviation analysis?
A form of subgroup discovery. Usually a target property given, for example the goal is to find statisticalaly interesting subgroup from the population
What are the ingredients of deviation analysis?
- A target measure and a verification test serving as a filter for patterns
- A quality measure to rank subgroups
- A search method that enumerates candidates subgroups systematically
How much must p deviate from p0 to be significant?
- z-score
- tests like chi2
- weighted relative accuracy
What is Association rule mining?
Originally designated for market basket analysis, aims at finding patterns in the shopping behaviour of customers.
Find sets of products that are frequently bought together (if buys bread and wine, then probably buys cheese)
If user buys A -> B does it mean B -> A?
No. For example sausage -> mustard but not necessarily mustard -> sausage
What is support of an item set?
Proportion of transactions that contain the item set