Metaheuristics for Data Mining Flashcards

1
Q

Data mining Tasks

A
  • Clustering
  • Association Rules
  • Classification
  • Feature Selection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Clustering

A

Description: Put items into clusters in such a such a way that maximizes similarity between elements of the same cluster and dissimilarity between elements that are part of different clusters.

The number of clusters can be fixed or variable.

Metaheuristics:

  • Ant Colony Optimization
  • Genetic or evolutionary algorithms

Solution representation:

  • matrix: adjecency matrix M(Cluster,Item) = Cluster.includes(Item) ? 1 : 0.
  • Integer: vector of size = number of items => | C1 | C2 | C1 | C3 | C2 | C1 |
  • Real encoding: vector of size = number of items. Elements are coordinates of cluster prototypes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Association Rules

A

Description: obtain significant relationships among theitems contained in different transactions. Following this first application, associationrules may now be defined in a more general manner.

Objective functions: confidence, support, statisical correlation, comprehensibility.

Metaheuristics:

  • Mono-objective: genetic algorithms
  • Multi-objective
    • Multi-objective Evolutionary Algo
    • Hybrid metaheuristics

Solution encoding:

  • Binary: indicates for each feature if it belongs to the condition part, tothe consequent part, or does not belong to the rules. Size of solution = 2k bits (where k is #features)
  • Integer encoding: lists features part of the rule (e.g. |condition size| condition features | prediction features |)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Classification

A

Description: build a model that predicts the value of onevariable, called theclass, from the known values of other variables. Selecting the set of features that participate in the model is an optimization problem.

Standard approaches: Probabilistic class., decision treest, K-nearest neighbors, Neural Networks, Support Vector Machines.

Objective functions based on TN, FP, FN, TP: Accuracy, error, precision, sensitivity, specificity.

There are no general ways of representing solutions.

Metaheuristics:

  • Decision trees: Evolutionary Algo, Ant Colony
  • Random Forest: Hybrid methods
  • KNN: Genetic Algo
  • Neural networks: particle swarm optimization
  • SVM: Evolutionary algos
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Feature Selection

A

Description: aims at selecting an optimum relevant set of features or attributesthat are necessary for classification.

Encoding:

  • string of n bits (biti=1 if feature is included), bad for big data
  • variable length string, with selected attributes.

Metaheuristics: Genetic algorithms, Particle Swarm Optimization; particularly for Big Data: Swarm intelligence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly