Skript 5 - Data Mining Flashcards

1
Q

Knowledge Discovery in Databases

A

Non trivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data

  • Process of applying data mining algorithm to extracted data um Infromationen zu gewinnen
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Was ist Data Mining?

A

Sub Prozess of KDD, used to derive a new pattern out of a given data sample. Data mining often used as Synonym for KDD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Welche Kritieren müssen die Pattern die erstellt wurden erfüllen?

A

Validity: Die erstelleten Pattern sollten auf neue Datensätze anwendbar sein

Understandability: Die abgeleiteten Pattern sollten einfach zu verstehen sein

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Was sind Data?

A

Represantation von Fakten

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Was sind Informationen

A

Kommunikation von Daten

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Was ist Knowlege / Wissen

A

Daten im spezifischen Kontext

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Wie hängen Informationen , Daten und Wissen zusammen?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In einer Datenbank, welche Charakteristiken kann ein Attribut haben

A

Charactersitics von Attributen

  • Type of the attribute
  • Distribution of values
  • Missing values
  • Quality of values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Welche Attribut Typen gibt es?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Welche Komponenten hat ein Decision Tree

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Wie wähle ich meine Nodes in einem Decision Tree aus?

A

Anhand der Entropie!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Was ist die Entropie

A

Expected value of information contained in a node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Weche Ansätze gibt es beim Clustering?

A

hierarchical Clustering und

Partition Based Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Welche Ansätze gibt es im Hierarchischen Clustering

A

2

  • Agglomerative Approach ( partition getting bigger during the clustering approach)
  • Divisive approach (Partitions are getting smaller during the clustering process )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Welche Distanzmaße gibt es ?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Wann stoppe ich k- means

A
  • max Zeit/Iterationen erreicht
  • No data points change their classification and cluster centers have not beend moved in an iteration
17
Q

Welche Probleme können beim Partition based Clustering auftreten?

A
  • Leere Cluster könnten auftreten
  • Zufällige Cluster Center und erste Partition ein Problem, Qualität kann von initial Cluster Centern abhängen
  • Händisch gewählt Anzahl von Cluster Centern
  • Object gehört zu genau einem Cluster Center
18
Q
A