Descriptive Data Mining Flashcards by Chris Huskey

Antecedent

The item set corresponding to the if portion of an if—then association rule.

How well did you know this?

Not at all

Perfectly

Association rule

An if—then statement describing the relationship between item sets.

How well did you know this?

Not at all

Perfectly

Centroid linkage

Uses the averaging concept of cluster centroids to define between-cluster similarity.

How well did you know this?

Not at all

Perfectly

Complete linkage

Measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations between the two clusters.

How well did you know this?

Not at all

Perfectly

Confidence

The conditional probability that the consequent of an association rule occurs given the antecedent occurs.

How well did you know this?

Not at all

Perfectly

Consequent

The item set corresponding to the then portion of an if—then association rule.

How well did you know this?

Not at all

Perfectly

Dendrogram

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering.

How well did you know this?

Not at all

Perfectly

Dimension reduction

Process of reducing the number of variables to consider in a data-mining approach.

How well did you know this?

Not at all

Perfectly

Euclidean distance

Geometric measure of dissimilarity between observations based on the Pythagorean theorem.

How well did you know this?

Not at all

Perfectly

Group average linkage

Measure of calculating dissimilarity between clusters by considering the distance between each pair of observations between two clusters.

How well did you know this?

Not at all

Perfectly

Hierarchical clustering

Process of agglomerating observations into a series of nested groups based on a measure of similarity.

How well did you know this?

Not at all

Perfectly

Jaccard’s coefficient

Measure of similarity between observations consisting solely of binary categorical variables that considers only matches of nonzero entries.

How well did you know this?

Not at all

Perfectly

k-means clustering

Process of organizing observations into one of k groups based on a measure of similarity.

How well did you know this?

Not at all

Perfectly

Lift ratio

The ratio of the confidence of an association rule to the benchmark confidence.

How well did you know this?

Not at all

Perfectly

market basket analysis

Analysis of items frequently co-occuring in transactions (such as purchases).

How well did you know this?

Not at all

Perfectly

matching coefficient

Study These Flashcards

Measure of similarity between observations based on the number of matching values of categorical variables.

McQuitty’s method

Study These Flashcards

Measure that computes the dissimilarity between a cluster AB (formed by merging clusters A and B) and a cluster C by averaging the distance between A and C and the distance between B and C.

Median linkage

Study These Flashcards

Method that computes the similarity between two clusters as the median of the similarities between each pair of observations in the two clusters.

Missing at random

Study These Flashcards

The case when data for a variable is missing due to a relationship a relationship between other variables.

Missing completely at random

Study These Flashcards

The case when data for a variable is missing purely due to random chance.

Missing not at random

Study These Flashcards

The case when data for a variable is missing due to its unrecorded value.

Observation

Study These Flashcards

A set of observed values of variables associated with a single entity, often displayed as a row in a spreadsheet or database.

Single linkage

Study These Flashcards

Measure of calculating dissimilarity between clusters by considering only the two most similar observations between the two clusters.

Support count

Study These Flashcards

The number of times that a collection of items occurs together in a transaction data set.

Unsupervised learning

Category of data-mining techniques in which an algorithm explains relationships without an outcome variable to guide the process.

Ward's method

procedure that partitions observations in a manner to obtain clusters with the least amount of information loss due to the aggregation

Descriptive Data Mining Flashcards

(26 cards)