Metaheuristics for Data Mining Flashcards
Data mining Tasks
- Clustering
- Association Rules
- Classification
- Feature Selection
Clustering
Description: Put items into clusters in such a such a way that maximizes similarity between elements of the same cluster and dissimilarity between elements that are part of different clusters.
The number of clusters can be fixed or variable.
Metaheuristics:
- Ant Colony Optimization
- Genetic or evolutionary algorithms
Solution representation:
- matrix: adjecency matrix M(Cluster,Item) = Cluster.includes(Item) ? 1 : 0.
- Integer: vector of size = number of items => | C1 | C2 | C1 | C3 | C2 | C1 |
- Real encoding: vector of size = number of items. Elements are coordinates of cluster prototypes.
Association Rules
Description: obtain significant relationships among theitems contained in different transactions. Following this first application, associationrules may now be defined in a more general manner.
Objective functions: confidence, support, statisical correlation, comprehensibility.
Metaheuristics:
- Mono-objective: genetic algorithms
- Multi-objective
- Multi-objective Evolutionary Algo
- Hybrid metaheuristics
Solution encoding:
- Binary: indicates for each feature if it belongs to the condition part, tothe consequent part, or does not belong to the rules. Size of solution = 2k bits (where k is #features)
- Integer encoding: lists features part of the rule (e.g. |condition size| condition features | prediction features |)
Classification
Description: build a model that predicts the value of onevariable, called theclass, from the known values of other variables. Selecting the set of features that participate in the model is an optimization problem.
Standard approaches: Probabilistic class., decision treest, K-nearest neighbors, Neural Networks, Support Vector Machines.
Objective functions based on TN, FP, FN, TP: Accuracy, error, precision, sensitivity, specificity.
There are no general ways of representing solutions.
Metaheuristics:
- Decision trees: Evolutionary Algo, Ant Colony
- Random Forest: Hybrid methods
- KNN: Genetic Algo
- Neural networks: particle swarm optimization
- SVM: Evolutionary algos
Feature Selection
Description: aims at selecting an optimum relevant set of features or attributesthat are necessary for classification.
Encoding:
- string of n bits (biti=1 if feature is included), bad for big data
- variable length string, with selected attributes.
Metaheuristics: Genetic algorithms, Particle Swarm Optimization; particularly for Big Data: Swarm intelligence.