M5 Flashcards
Online Analytical Processing (OLAP) with data warehouses tells us what is happening and how while data mining tells us what is likely to ___
happen
Data mining is ___ Discovery in (commercial) Databases
Knowledge
Data mining is a ____ rather than a product
process
This is the fastest growing segment of business intelligence market
Data Mining
This is how the data mining in biology and medicine called
Bioinformatics
What are the two broad groups of data mining
DIrected and Uniderected data mining
In this type of data mining, we know what we are looking for and we aim to find the value of a pre-identified target variable in terms of a collection of input variables, eg, classifying insurance claims
Directed Data Mining
In this type of data mining, it finds patterns in data and leaves it to the user to find the significance of these patterns
Undirected Data Mining
Classification, Estimation, and ___ are under Directed Data Mining
Prediction
Affinity grouping & Assoc rules, ___, and Description & Visualization
are under undirected data mining
Clustering
This type of data mining approach are particularly suitable for solving classification problems
Decision Trees
Each leaf node in a decision tree is labelled with a ___ label
class
In decision tree, rhe class label decided by the class of the records that ended up in that __ during training
leaf
In decision tree, each edge originating from an internal node is labelled with a ___ predicate involving that node’s splitting attribute
splitting
In decision tree, the ___ forces any record to take a unique path from the root to exactly one leaf node
splitting predicate
To prevent overfitting in decision trees, test data set are used to ___ decision trees once it has been built using the training data set.
prune
In decision trees, this is to an iterative process of splitting the training data into partitions (regions of record space)
recursive partitiioning
The most important task in building a decision tree is to decide which of the ___ gives the best split
attributes
In decision trees, a node becomes a ___ node when no split can be found that significantly decreases the diversity
leaf
In decision trees, pruning is done by removing leaves and branches (edges leading to leaves) that fail to ___
generalize
This aims to discover structure in a complex data set as a whole in order to carve it up into simpler groups
Automatic Cluster Detection
Type of clustering that is available in a wide variety of commercial data mining tools. It divides the data set into a predetermined number, k, of clusters
K-Means Clustering
In K-Means Clustering, in the first step, k data points are selected to be the seeds. Each seed is an ___ cluster with only one element
embryonic
The ___ of a cluster of records calculated by taking average of each field for all the records in that cluster
centroid
___ distance most commonly used for measuring distance by data mining software.
Euclidean
In the k-means method, the original choice of the value of k determines the number of ___ that will be found
clusters
___ claimed to be often more effective than k-means for complex shaped clusters
SOMs
Four ways of utilizing data mining expertise in business:
1. By purchasing readymade ___ from outside vendors.
2. By purchasing software that embodies data mining expertise designed for a particular application
3. By hiring outside consultants to perform data mining for special projects
4. By developing own data mining skills within the business organization
scores
___ automate the process of creating candidate models and selecting the ones that perform best
Model building software
Outside expertise for data mining is likely to be available in three possible places:
1. From a data mining software vendor
2. Data mining centers
3. ___ companies
Consulting