Module 2B Notes Flashcards by Alex Young

What is the goal of clustering?

To segment observations into similar groups based on observed variables.

How well did you know this?

Not at all

Perfectly

What type of machine learning is clustering classified as?

Unsupervised machine learning.

How well did you know this?

Not at all

Perfectly

What is a common application of cluster analysis in marketing?

Market segmentation.

How well did you know this?

Not at all

Perfectly

What is a challenge when clustering observations based on continuous variables?

Determining how many clusters and where the boundaries are.

How well did you know this?

Not at all

Perfectly

What is bottom-up hierarchical clustering?

A method that starts with each observation in its own cluster and merges similar clusters.

How well did you know this?

Not at all

Perfectly

What is K-means clustering?

A method that assigns observations to one of k clusters based on similarity.

How well did you know this?

Not at all

Perfectly

What is the most common method to measure dissimilarity between observations with continuous numeric variables?

Euclidean Distance.

How well did you know this?

Not at all

Perfectly

What does Euclidean Distance measure?

The variance between observations, becoming smaller as observations become more similar.

How well did you know this?

Not at all

Perfectly

What is the effect of scale on Euclidean Distance?

It is highly influenced by the scale on which variables are measured.

How well did you know this?

Not at all

Perfectly

What is a Z-score?

A measure of how many standard deviations away from the mean a raw score is in a standard normal distribution.

How well did you know this?

Not at all

Perfectly

What is Manhattan distance?

A dissimilarity measure that is more robust to outliers than Euclidean distance.

How well did you know this?

Not at all

Perfectly

What is the matching coefficient?

A measure for categorical variables where two observations both having a 0 entry indicates similarity.

How well did you know this?

Not at all

Perfectly

What limitation does the matching coefficient have?

‘0’ values can have many meanings and do not necessarily indicate similarity.

How well did you know this?

Not at all

Perfectly

What is Jaccard’s coefficient?

A measure that does not count matching zero entries in categorical variables.

How well did you know this?

Not at all

Perfectly

What does hierarchical clustering determine?

The similarity of two clusters by considering the similarity between the observations composing each cluster.

How well did you know this?

Not at all

Perfectly

What are some methods for comparing observations in hierarchical clustering?

Study These Flashcards

Single linkage
Complete linkage
Group average linkage
Median linkage
Centroid linkage

What are the steps in K-means clustering?

Study These Flashcards

Initialization
Update Step
Assignment Step

What is an association rule?

Study These Flashcards

An if-then statement that conveys the likelihood of certain items being purchased together.

What is the antecedent in an association rule?

Study These Flashcards

The collection of items corresponding to the if portion of the rule.

What is the consequent in an association rule?

Study These Flashcards

The item set corresponding to the then portion of the rule.

What is a key factor in evaluating an association rule?

Study These Flashcards

How actionable it is and how well it explains the relationship between item sets.

What is an example of an association rule from Walmart’s data mining?

Study These Flashcards

‘If a customer purchases a Barbie doll, then a customer also purchases a candy bar.’

What are some algorithms used to generate association rules?

Study These Flashcards

FP-growth algorithm
Apriori algorithm
Eclat algorithm

What do the algorithms for generating association rules differ in?

Study These Flashcards

Their efficiency in large data sets.

Does KNIME disclose the algorithm used by the program?

No, KNIME does not disclose the algorithm.

Module 2B Notes Flashcards

(25 cards)