Descriptive Data Mining Flashcards
Antecedent
The item set corresponding to the if portion of an if—then association rule.
Association rule
An if—then statement describing the relationship between item sets.
Centroid linkage
Uses the averaging concept of cluster centroids to define between-cluster similarity.
Complete linkage
Measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations between the two clusters.
Confidence
The conditional probability that the consequent of an association rule occurs given the antecedent occurs.
Consequent
The item set corresponding to the then portion of an if—then association rule.
Dendrogram
A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering.
Dimension reduction
Process of reducing the number of variables to consider in a data-mining approach.
Euclidean distance
Geometric measure of dissimilarity between observations based on the Pythagorean theorem.
Group average linkage
Measure of calculating dissimilarity between clusters by considering the distance between each pair of observations between two clusters.
Hierarchical clustering
Process of agglomerating observations into a series of nested groups based on a measure of similarity.
Jaccard’s coefficient
Measure of similarity between observations consisting solely of binary categorical variables that considers only matches of nonzero entries.
k-means clustering
Process of organizing observations into one of k groups based on a measure of similarity.
Lift ratio
The ratio of the confidence of an association rule to the benchmark confidence.
market basket analysis
Analysis of items frequently co-occuring in transactions (such as purchases).