outliers and text mining Flashcards

1
Q

what are ouliers

A

they consist of the object that deviates from the normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the types of outliers

A

Global outlier when a point significantly deviate from the rest of the data (very high or low speed )

Contextual outliers deviate based on a selected context ( ex. Is 30o C an outlier ? Is 40km/h on the highway an outlier ? )

collective outlier subset of objects that collectively deviate from a specific from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the challenges that might be faced in regards to ouliers detection

A

modelling outliers and objects properly
application-specific outliers detection
handling noise in outliers
understandability of outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

explain briefly the distance-based outlier detection

A

for a given r the distrance threshold and pi the fraction threshold we define whether specific point is an outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

explain briefly the distance-based outlier detection

A

for a given r the distance threshold and pi the fraction threshold we define whether a specific point is an outlier, hence we can determine whether an object o is an outlier by checking the distance between o and its k-nearest neighbours.
The downside of this algorithm is that its complexity is N^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

explain briefly a grid-based outlier

A

the idea is that each cell is a hypercube with a diagonal length of r/2,
then we have two pruning rules :
for level 1 pruning all the cells marked as 1 are definitely neighbours
for level 2 pruning: for the cells that exceed 1 a+b1+b2 < pi*n + 1 then they are outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

explain density based outlier detection

A

comparing outliers compared to their local neighbourhood, the idea is that the density around an outlier object is significantly different around its neighbors , for this we rely on the relative density of an object against its neighbors as the indicator of the degree object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

explain clustering-based outlier

A

An object is an outlier if
• (1) it does not belong to any cluster,
• (2) there is a large distance between the object and its closest cluster ,
• or (3) it belongs to a small or sparse cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the strength and weaknesses of clustering-based methods

A

strengt, it may detect outliers without requiring any labeled data , work well for many types of data while cons is the effectiveness depends highly on the chosen algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

explain classification based clustering

A

Idea: Train a classification model that can
distinguish “normal” data from outliers
• Requires many abnormal samples.
• Abnormal might not well cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly