Chapter 31 Flashcards
What are supervised mining techniques
The inputs are given by end user/user
What are unsupervised mining techniques
The inputs are not given by end user/user. We do not how much classes and properties there and we are not guiding how to do data mining.
What is similarity / dissimilarity ratio
The match/mismatch ratio of the matrix sets the target evolutionary distance
What is time complexity of similarity matrix
n(square) x m
What are main types of data mining
1- Supervised
2- Unsupervised
What are types of supervised data mining
- Bayesian modeling
- Decision Tree
- Neural network etc
What are 2 types of unsupervised data mining
1- One-way clustering
2- Two-way clustering
What is one-way clustering
When we cluster a data matrix, we use all attributes and do rows clustering. It gives global view of data matrix.
What is two-way clustering
We use columns and rows clustering in two-way in data matrix. It gives local view of data matrix
What is min-max “distances” in clustering
Records are grouped with similarity constraint. In clustering, the intra-distance should be maximum e.g. clustering of employees in company with similar salary. Young people cluster is far away with old people cluster.
How to identify association in records
Map the association in distance matrix. So we can quantify records with more similarity.
What is numeric and non-numeric attributes
Numeric attributes are with numeric values and non-numeric attributes are with non-numeric values.
Can graph be stored in matrix form
Yes. Matrix is a data structure that can store a graph.
What is binary matrix
The matrix that has values 0 and 1
What are 2 methods to find clusters in matrix
- Graph portioning (Separate vertices which have more connectivity and less connectivity)
- Click detection
What is classification
Classification is a data mining function that assigns items in a collection to target categories or classes.
How classification works
We take data set and convert it into 2 sets.
1. Training set
2. Test set
Training set is testify on test set and get 2 classes of it. So we can classify data.
Clustering vs cluster detection
First do clustering and then do cluster detection. (note: once we have clusters then we can know how much number of clusters exists in system)
What is K means cluster detection technique
K means clustering techniques use a mean point to categorize values in clusters. It is fast technique.
What is mean point in clustering
The point in a cluster which defines in which cluster the value falls
Does k means clustering supervised
Yes
Does k means clustering converse
Yes