Chapter 12 Flashcards
outlier detection, anaomaly detection
The process of finding data objecrts with nehaviors that ar very differne from expectation
such objects are called outliers or anomalies
normal or expected data
data objexts that are not outliers which are abnormal data
outlier
a data object tjhat deviates significantly from the normal objects as if it were genrated by a differnet mechanism
nosie data
is random error or variance in a measured variable
noise should be removed before outlier detection
three types of outliers
global, contextual and collective outlires
global outlier or poit anaomaly
eobject is Og if it siginicatly deciates fiorm the rest of the data set
issue: find an appropriate measurement of deviation
contexutal outlier or conditional outlier
object isOc if it deviates ignifcantly based on a selected context
ex. 80 degrees in urban, outlier depends on if it is winter oe summer
contextual attributes and behavioral attributes
generalization of local outlires
issue: how to define or formualre meaningful context
contexual attibutes
deines the context, time or location
behavioral attributes
chartacteristics of the object, used in outlier evaluation, ex temperature
collective outliers
a subset of data objects collectively deciare significantly form the whole data ser even if the individual data objects may not be outliers
a data set may have multiple types of outliers
one object may belong to more than one type of outlier
two outlier detection methods
based on whether user labeled examples of outliers can be obtained: supervised, semi supervised unserpervised methods
based on assumptions about normal data and outliers: statistical proximity based and cluseting based methods
supervised methods
a classigfication problem
model normal objects ad report those not matching as otliers or treat those not matching the model as normal
ChallengesL imbalanced classes, catch as many outliers as possible, recall is more important than accuracy
unserpervide methods
assume the normal objects aresomwaht clustered into multiple groups
an outlier is expected to be far way
weakneees: can not detect collective outliers effectively
semi supervised methods
regarded as appluications of semi supervised learning
use labeled exampkse and the proximate unlabled object to train a model for normal objects
those not fitting the model are outliers
statistical methods, model based methosa
assume that the normal data follow some statistical model, those that do not follow are outliers