Lecture 9 - Unsupervised Learning II Flashcards

1
Q

What is anomaly detection?

A

The detection of rare outliers which differ significantly from the rest of the data, can indicate medical problems and mechanical failures for example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are point anomalies?

A

Single points that represent small or big values

—–/—-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are contextual anomalies?

A

Unexpected sequences given history

/\/\/\/\___/\/\

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are collective anomalies?

A

Neither single points nor patterns look strange, but in global sense something looks off

For example all 3 lines drop at the same time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are concept drifts?

A

A slow and steady drift to some new state

For example slowly climbing upwards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a change point detection?

A

a strange step, where a shift occurs, for example first high line then sudden drop

———-_________

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is supervised anomaly detection?

A

Labeled data for normal and abnormal cases, unbalanced data makes this difficult approach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is semi-supervised anomaly detection?

A

Based on a model of normal behaviour which analyses the probability of a data point being within the normal range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is unsupervised anomaly detection?

A

Most widely used approach, no labeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is deviation analysis?

A

A form of subgroup discovery. Usually a target property given, for example the goal is to find statisticalaly interesting subgroup from the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the ingredients of deviation analysis?

A
  • A target measure and a verification test serving as a filter for patterns
  • A quality measure to rank subgroups
  • A search method that enumerates candidates subgroups systematically
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How much must p deviate from p0 to be significant?

A
  • z-score
  • tests like chi2
  • weighted relative accuracy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Association rule mining?

A

Originally designated for market basket analysis, aims at finding patterns in the shopping behaviour of customers.

Find sets of products that are frequently bought together (if buys bread and wine, then probably buys cheese)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If user buys A -> B does it mean B -> A?

A

No. For example sausage -> mustard but not necessarily mustard -> sausage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is support of an item set?

A

Proportion of transactions that contain the item set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is support of an association rule X -> Y?

A

How often X and Y are bought together

17
Q

What is confidence of an association rule X -> Y?

A

The percentage of all transactions satisfying X that also satisfy Y. (estimate that Y is bought given X)

18
Q

How to prune item set tree?

A
  • Structural pruning: make sure that there is only one counter for each possible item set (so no ab and ba)
  • Size based pruning: prune the tree if a certain depth is reached
  • Support based pruning: no superset of an infrequent item set can be frequent, so if no one buys mustard then no one will buy mustard and milk
19
Q

What is Apriori?

A

Bread-first search for item set

20
Q

What is Eclat?

A

Depth-first search for item set

21
Q

What is free item set?

A

Any frequent item set that has a support higher than the minimal support

22
Q

What is closed item set?

A

A frequent item set is closed if no superset has the same support.

For example c = 70%, cd exists but only 40% support

23
Q

What is maximal item set?

A

A frequent item set if no superset is frequent, meaning that you can’t make any more subsets

For example a,d can be made to a,d,e but if after that can’t be made any more sets then a,d,e is maximal item set

24
Q

How to calculate the confidence for association rule?

A

Example: a,c,e bought together 30% of time. a,e bought together 40% of time.

conf = 30%/40% = 75%. There is a 75% confidence that c will be bought with a,e

25
Q

What should be taken into account before applied in practice?

A

Being aware that something that is predicted by the model might change people’s behaviour

26
Q

Why should you monitor the system during operation?

A

To find out a drop in performance, the world might change over time and so might the objectives and assumptions

27
Q

What are the 5 anomalies?

A
  1. Single point anomaly
  2. Collective anomaly
  3. Contextual anomaly
  4. Concept drift
  5. Change point detection