M5 Flashcards

1
Q

Online Analytical Processing (OLAP) with data warehouses tells us what is happening and how while data mining tells us what is likely to ___

A

happen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data mining is ___ Discovery in (commercial) Databases

A

Knowledge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data mining is a ____ rather than a product

A

process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

This is the fastest growing segment of business intelligence market

A

Data Mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

This is how the data mining in biology and medicine called

A

Bioinformatics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two broad groups of data mining

A

DIrected and Uniderected data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In this type of data mining, we know what we are looking for and we aim to find the value of a pre-identified target variable in terms of a collection of input variables, eg, classifying insurance claims

A

Directed Data Mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In this type of data mining, it finds patterns in data and leaves it to the user to find the significance of these patterns

A

Undirected Data Mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Classification, Estimation, and ___ are under Directed Data Mining

A

Prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Affinity grouping & Assoc rules, ___, and Description & Visualization
are under undirected data mining

A

Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

This type of data mining approach are particularly suitable for solving classification problems

A

Decision Trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Each leaf node in a decision tree is labelled with a ___ label

A

class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In decision tree, rhe class label decided by the class of the records that ended up in that __ during training

A

leaf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In decision tree, each edge originating from an internal node is labelled with a ___ predicate involving that node’s splitting attribute

A

splitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In decision tree, the ___ forces any record to take a unique path from the root to exactly one leaf node

A

splitting predicate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

To prevent overfitting in decision trees, test data set are used to ___ decision trees once it has been built using the training data set.

17
Q

In decision trees, this is to an iterative process of splitting the training data into partitions (regions of record space)

A

recursive partitiioning

18
Q

The most important task in building a decision tree is to decide which of the ___ gives the best split

A

attributes

19
Q

In decision trees, a node becomes a ___ node when no split can be found that significantly decreases the diversity

20
Q

In decision trees, pruning is done by removing leaves and branches (edges leading to leaves) that fail to ___

A

generalize

21
Q

This aims to discover structure in a complex data set as a whole in order to carve it up into simpler groups

A

Automatic Cluster Detection

22
Q

Type of clustering that is available in a wide variety of commercial data mining tools. It divides the data set into a predetermined number, k, of clusters

A

K-Means Clustering

23
Q

In K-Means Clustering, in the first step, k data points are selected to be the seeds. Each seed is an ___ cluster with only one element

24
Q

The ___ of a cluster of records calculated by taking average of each field for all the records in that cluster

25
Q

___ distance most commonly used for measuring distance by data mining software.

26
Q

In the k-means method, the original choice of the value of k determines the number of ___ that will be found

27
Q

___ claimed to be often more effective than k-means for complex shaped clusters

28
Q

Four ways of utilizing data mining expertise in business:
1. By purchasing readymade ___ from outside vendors.
2. By purchasing software that embodies data mining expertise designed for a particular application
3. By hiring outside consultants to perform data mining for special projects
4. By developing own data mining skills within the business organization

29
Q

___ automate the process of creating candidate models and selecting the ones that perform best

A

Model building software

30
Q

Outside expertise for data mining is likely to be available in three possible places:
1. From a data mining software vendor
2. Data mining centers
3. ___ companies

A

Consulting