Ch. 4 Flashcards

1
Q

Which of the following reasons is responsible for increase in use of data mining techniques in business

A

Ability to electronically warehouse data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Observation refers to

A

Set of recorded values of variables associated with a single entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Category of data mining techniques that detect patterns and relationships in the data

A

Descriptive data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data mining method that can be used in market segmentation to divide consumers into different homogenous groups is

A

Cluster analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which is true of bottom up hierarchical clustering

A

Starts with each observation in its own cluster then iteratively combine two most similar clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The k-means clustering is process of

A

Organizing observations into series of nested groups based on measure of similarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Simplest measure of similarity between observations consisting solely of categorical variables is given by

A

Matching coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Jaccards coefficient is different from matching coefficient in that the former

A

Doesn’t count matching zero entries while the matter does

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Single linkage measures dissimilarity between two clusters by considering

A

Only the two closest observations in these clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Measures dissimilarity between two clusters by considering only the two most distant observations in clusters

A

Complete linkage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Avg group linkage measures dissimilarity between two clusters by considering

A

Avg distance over all parts of observations between clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Measures dissimilarity between two clusters by using the distance between cluster centroids

A

Avg distance over all pairs of observations between clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The vector of the avgs computed for each variable across all cluster observations

A

Centroid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Tree diagram used to illustrate sequence of nested clusters produced by hierarchies clustering known as

A

Dendogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If the Euclidean distance were to be represented in a right triangle which of the following would be considered distance between two observations

A

Hypotenuse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Endpoint of k means clustering algorithm occurs when

A

No further changes are observed in a cluster structure and number

17
Q

Analysis of items frequently co-occurring in transactions such as purchases known as

A

Market basket analysis

18
Q

Refers to number of times that a collection of items occur together in transaction data

A

Support count

19
Q

In the theory of association rules in data mining by confidence we mean an estimated probability that

A

The consequent occurs given that the antecedent occurs