Lecture 7: Visual Data Mining Principles Flashcards

1
Q

What is Data Mining?

A

Automatic algorithmic extraction of valuable information from raw data

Goal: Find interesting patterns/outliers/trends

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some common Data Mining tasks?

A

Anomaly detection
‣ Outlier/change/deviation detection
‣ Example: Fraud detection
Association rule learning
‣ Dependency modeling
‣ Correlations between variables
‣ Example: Find products that are bought together
Clustering
‣ Identify similar groups or structures without using known structures
‣ Example: Find groups of customers that behave similarly Fayyad et al. 1996 14
Classification
‣ Generalize known structure to apply to new data
‣ Example: Spam classification
Regression
‣ Find function which models data with least error
‣ Can be used for prediction
‣ Example: Stock market analysis
Summarization
‣ Find more compact representation
‣ Examples: Report generation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is clustering?

A

Automatically group data instances into classes based on mutual similarity

Unsupervised: finding clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two types of Hierarchical Clustering?

A

agglomerative clustering: start with each node as a cluster and merge
divisive clustering: start with one cluster, and split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does K-Means Clustering work?

A

The K-Means algorithm for clustering basically consists of 5 steps:
Step 1: Choose the number K of clusters
Step 2: Select random K points as cluster centers
Step 3: Assign each data point to the nearest centroid
Step 4: Compute and place the new centroid of each cluster
Step 5: Repeat step 4 until no observations change cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe Decision Trees

A

**Splitting dataset into branches
Task: Predict outcome of unseen observations
Build tree iteratively, starting at root
Need algorithm to decide which attribute to split **
‣ Goal: Find informative features
‣ Entropy concept = how mixed up something is
‣ Maximize information gain
Prune tree to avoid overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe Dimensionality Reduction

A

Curse of dimensionality in many real world problems
Transform high-dim data to space with fewer dimensions (2D/3D)
Powerful technique to look for hidden structure in high-dimensional data Used in various domains

‣ Document categorization, drug discovery, machine learning model debugging, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some disadvantages of Dimensionality Reduction?

A

‣ Hard to preserve semantics of single dimensions
‣ Hard to understand and interpret
‣ Error not visible —> inspiring false confidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly