Lecture 9 - Unsupervised Learning II Flashcards
What is anomaly detection?
The detection of rare outliers which differ significantly from the rest of the data, can indicate medical problems and mechanical failures for example
What are point anomalies?
Single points that represent small or big values
—–/—-
What are contextual anomalies?
Unexpected sequences given history
/\/\/\/\___/\/\
What are collective anomalies?
Neither single points nor patterns look strange, but in global sense something looks off
For example all 3 lines drop at the same time
What are concept drifts?
A slow and steady drift to some new state
For example slowly climbing upwards
What is a change point detection?
a strange step, where a shift occurs, for example first high line then sudden drop
———-_________
What is supervised anomaly detection?
Labeled data for normal and abnormal cases, unbalanced data makes this difficult approach
What is semi-supervised anomaly detection?
Based on a model of normal behaviour which analyses the probability of a data point being within the normal range
What is unsupervised anomaly detection?
Most widely used approach, no labeled data
What is deviation analysis?
A form of subgroup discovery. Usually a target property given, for example the goal is to find statisticalaly interesting subgroup from the population
What are the ingredients of deviation analysis?
- A target measure and a verification test serving as a filter for patterns
- A quality measure to rank subgroups
- A search method that enumerates candidates subgroups systematically
How much must p deviate from p0 to be significant?
- z-score
- tests like chi2
- weighted relative accuracy
What is Association rule mining?
Originally designated for market basket analysis, aims at finding patterns in the shopping behaviour of customers.
Find sets of products that are frequently bought together (if buys bread and wine, then probably buys cheese)
If user buys A -> B does it mean B -> A?
No. For example sausage -> mustard but not necessarily mustard -> sausage
What is support of an item set?
Proportion of transactions that contain the item set
What is support of an association rule X -> Y?
How often X and Y are bought together
What is confidence of an association rule X -> Y?
The percentage of all transactions satisfying X that also satisfy Y. (estimate that Y is bought given X)
How to prune item set tree?
- Structural pruning: make sure that there is only one counter for each possible item set (so no ab and ba)
- Size based pruning: prune the tree if a certain depth is reached
- Support based pruning: no superset of an infrequent item set can be frequent, so if no one buys mustard then no one will buy mustard and milk
What is Apriori?
Bread-first search for item set
What is Eclat?
Depth-first search for item set
What is free item set?
Any frequent item set that has a support higher than the minimal support
What is closed item set?
A frequent item set is closed if no superset has the same support.
For example c = 70%, cd exists but only 40% support
What is maximal item set?
A frequent item set if no superset is frequent, meaning that you can’t make any more subsets
For example a,d can be made to a,d,e but if after that can’t be made any more sets then a,d,e is maximal item set
How to calculate the confidence for association rule?
Example: a,c,e bought together 30% of time. a,e bought together 40% of time.
conf = 30%/40% = 75%. There is a 75% confidence that c will be bought with a,e
What should be taken into account before applied in practice?
Being aware that something that is predicted by the model might change people’s behaviour
Why should you monitor the system during operation?
To find out a drop in performance, the world might change over time and so might the objectives and assumptions
What are the 5 anomalies?
- Single point anomaly
- Collective anomaly
- Contextual anomaly
- Concept drift
- Change point detection