Ch. 4 Flashcards
Which of the following reasons is responsible for increase in use of data mining techniques in business
Ability to electronically warehouse data
Observation refers to
Set of recorded values of variables associated with a single entity
Category of data mining techniques that detect patterns and relationships in the data
Descriptive data mining
Data mining method that can be used in market segmentation to divide consumers into different homogenous groups is
Cluster analysis
Which is true of bottom up hierarchical clustering
Starts with each observation in its own cluster then iteratively combine two most similar clusters
The k-means clustering is process of
Organizing observations into series of nested groups based on measure of similarity
Simplest measure of similarity between observations consisting solely of categorical variables is given by
Matching coefficient
Jaccards coefficient is different from matching coefficient in that the former
Doesn’t count matching zero entries while the matter does
Single linkage measures dissimilarity between two clusters by considering
Only the two closest observations in these clusters
Measures dissimilarity between two clusters by considering only the two most distant observations in clusters
Complete linkage
Avg group linkage measures dissimilarity between two clusters by considering
Avg distance over all parts of observations between clusters
Measures dissimilarity between two clusters by using the distance between cluster centroids
Avg distance over all pairs of observations between clusters
The vector of the avgs computed for each variable across all cluster observations
Centroid
Tree diagram used to illustrate sequence of nested clusters produced by hierarchies clustering known as
Dendogram
If the Euclidean distance were to be represented in a right triangle which of the following would be considered distance between two observations
Hypotenuse