outliers and text mining Flashcards

Question 1

Q

what are ouliers

Answer

A

they consist of the object that deviates from the normal

Question 2

Q

what are the types of outliers

Answer

A

Global outlier when a point significantly deviate from the rest of the data (very high or low speed )

Contextual outliers deviate based on a selected context ( ex. Is 30o C an outlier ? Is 40km/h on the highway an outlier ? )

collective outlier subset of objects that collectively deviate from a specific from data

Question 3

Q

what are the challenges that might be faced in regards to ouliers detection

Answer

A

modelling outliers and objects properly
application-specific outliers detection
handling noise in outliers
understandability of outliers

Question 4

Q

explain briefly the distance-based outlier detection

Answer

A

for a given r the distrance threshold and pi the fraction threshold we define whether specific point is an outlier

Question 5

Q

explain briefly the distance-based outlier detection

Answer

A

for a given r the distance threshold and pi the fraction threshold we define whether a specific point is an outlier, hence we can determine whether an object o is an outlier by checking the distance between o and its k-nearest neighbours.
The downside of this algorithm is that its complexity is N^2

Question 6

Q

explain briefly a grid-based outlier

Answer

A

the idea is that each cell is a hypercube with a diagonal length of r/2,
then we have two pruning rules :
for level 1 pruning all the cells marked as 1 are definitely neighbours
for level 2 pruning: for the cells that exceed 1 a+b1+b2 < pi*n + 1 then they are outliers

Question 7

Q

explain density based outlier detection

Answer

A

comparing outliers compared to their local neighbourhood, the idea is that the density around an outlier object is significantly different around its neighbors , for this we rely on the relative density of an object against its neighbors as the indicator of the degree object

Question 8

Q

explain clustering-based outlier

Answer

A

An object is an outlier if
• (1) it does not belong to any cluster,
• (2) there is a large distance between the object and its closest cluster ,
• or (3) it belongs to a small or sparse cluster

Question 9

Q

what are the strength and weaknesses of clustering-based methods

Answer

A

strengt, it may detect outliers without requiring any labeled data , work well for many types of data while cons is the effectiveness depends highly on the chosen algorithm

Question 10

Q

explain classification based clustering

Answer

A

Idea: Train a classification model that can
distinguish “normal” data from outliers
• Requires many abnormal samples.
• Abnormal might not well cluster.

outliers and text mining Flashcards

(10 cards)