W04 Unsupervised Learning Flashcards

Question 1

Q

Applications of Data Mining

Answer

A

Product Placement
Fraud Detection
Traffic Forecasts
Web Optimization
Customer Relationship Management

Question 2

Q

Data Mining Methods

Answer

A

pre-processing, description and reduction

visualization, correlation, association rule learning to show relationships

explanation through regression

classification, discriminant analysis

anomaly recognition

prognosis

segmentation through cluster analysis

Question 3

Q

Terminology of a Dataset table

Answer

A

Attribute (columns)

Instance (rows)

Question 4

Q

Evaluating Datamining

Split?

Answer

A

Training Set
-generate model from the data
Validation Set
-validate and improve the model
Test Set
-test applicability and performance of model

Question 5

Q

Supervised Learning

Answer

A

For all sets, solution is known a priori

Question 6

Q

Unsupervised Learning

Answer

A

No strucutral solution yet (clustering, association rules)

Question 7

Q

Clustering
what?
Conditions?
Applications?

Answer

A

unsupervised segmentation of an instance set based on attributes

instances belong to different segments
described by multi-dimensional attributes
attribute values can be quantified

Instance segments are not known a priori but possibility of segmentation is expected

Question 8

Q

Kinds of clusters

Answer

A

unique (one instance one lcuster)

overlapping (one instance multiple clusters)

probabilistic (one instance, one probability for cluster)

Question 9

Q

Hierarchical Clustering

Answer

A

agglomerative: from n clusters to one cluster
divisive: from one cluster to n clusters

Question 10

Q

Offline vs Online Clustering

Answer

A

Offline: complete set of instances known prior to clustering

Online: added iteratively

growing datasets
more efficient
streaming clustering processes only latest instances

Question 11

Q

k-Means Clustering Algorithm

Answer

A

unique; flat; offline
k-means++ non-random initial
best k?
variations: distance measures

1 k - number of desired clusters

2 randomly select k cluster centroids

3 assign all instances to nearest cluster centroid

4 calculate means for attributes of instances in one cluster

5 set means as new centroids

6 assign again

if 6 and 3 are same result_ convergence! else 4

Question 12

Q

Cluster Evaluation Criteria

Answer

A

-Elbow (compute gain in explained variance per increase of k; plot and look for elbow)

Dunn (intra-cluster-distance vs intra-cluster distances)

Question 13

Q

Before and After Clustering

Answer

A

before:
pre-process data
-relevant attributes
-dependencies
normalize and weight values

cluster:

various number of clusters
different intial seeds
different distance measures and algorithms

process results:

label data
visualize clusters
predict cluster adherence

Question 14

Q

Association Rules

Answer

A

Derive rules between instanc attributes based on sufficient observations, e.g. for market basket analyses

no a priori assumptions needed.
evaluate via 
confidence
support
lift
expected confidence

Question 15

Q

Market Basket Analyses

Answer

A

transaction data protocolled at check out

record buy instance, branch, article, quantity, date

lots of data

what articles are bought together?

-shoes and socks.

Question 16

Q

Association Rule

Answer

Study These Flashcards

A

Antecedent A leads to consequent B

-based on probability rather than logic

Question 17

Q

Support

Answer

Study These Flashcards

A

share of instances in dataset that fulfill the rule

Question 18

Q

Confidence

Answer

Study These Flashcards

A

Share of instances that include A and B based on set of instances that fulfil antecedent A

Question 19

Q

Lift

Answer

Study These Flashcards

A

ratio of observed support to that expected if and B were independent

Question 20

Q

Association Rule Learning
what?
challenge?
solution?

Answer

Study These Flashcards

A

associate attribute values with each-other by simple rules

many rules possible

filter by support and confidence and visualization

W04 Unsupervised Learning Flashcards

(20 cards)