MLSEC 3 Flashcards
Numerical Features
Mapping of events to a vector space
Normalization of numerical features
Scaling and offset often different
Standard normalization
Min-max normalization
Problems with Numerical Features
Definition of good feature set is hard
Not suitable for structured data
Kernel Function
Similarity measure for objects in a domain X
Kernel Function Requirement
symmetric
PSD
Bag-of-Words Features
Characterization using non-overlapping substrings
Suitable for analysis of strings with known structure
N-gram Features
Characterization using “snippets” of length n (n-grams)
Suitable for analysis of strings with unknown structure
All-Substring Kernel
Characterization using all possible substrings
More an academic exercise rather than useful
Implementation of string features
Explicit but sparse representation of feature vectors
Implementation of string kernel
Implicit definition of feature space
Suffix Tree
pecahin per 1,2,3,… huruf dari belakang
mulai root terus ikutin dari depan hurufnya