Module 2B Notes Flashcards

1
Q

What is the goal of clustering?

A

To segment observations into similar groups based on observed variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What type of machine learning is clustering classified as?

A

Unsupervised machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a common application of cluster analysis in marketing?

A

Market segmentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a challenge when clustering observations based on continuous variables?

A

Determining how many clusters and where the boundaries are.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is bottom-up hierarchical clustering?

A

A method that starts with each observation in its own cluster and merges similar clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is K-means clustering?

A

A method that assigns observations to one of k clusters based on similarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the most common method to measure dissimilarity between observations with continuous numeric variables?

A

Euclidean Distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does Euclidean Distance measure?

A

The variance between observations, becoming smaller as observations become more similar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the effect of scale on Euclidean Distance?

A

It is highly influenced by the scale on which variables are measured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a Z-score?

A

A measure of how many standard deviations away from the mean a raw score is in a standard normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Manhattan distance?

A

A dissimilarity measure that is more robust to outliers than Euclidean distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the matching coefficient?

A

A measure for categorical variables where two observations both having a 0 entry indicates similarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What limitation does the matching coefficient have?

A

‘0’ values can have many meanings and do not necessarily indicate similarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Jaccard’s coefficient?

A

A measure that does not count matching zero entries in categorical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does hierarchical clustering determine?

A

The similarity of two clusters by considering the similarity between the observations composing each cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some methods for comparing observations in hierarchical clustering?

A
  • Single linkage
  • Complete linkage
  • Group average linkage
  • Median linkage
  • Centroid linkage
17
Q

What are the steps in K-means clustering?

A
  • Initialization
  • Update Step
  • Assignment Step
18
Q

What is an association rule?

A

An if-then statement that conveys the likelihood of certain items being purchased together.

19
Q

What is the antecedent in an association rule?

A

The collection of items corresponding to the if portion of the rule.

20
Q

What is the consequent in an association rule?

A

The item set corresponding to the then portion of the rule.

21
Q

What is a key factor in evaluating an association rule?

A

How actionable it is and how well it explains the relationship between item sets.

22
Q

What is an example of an association rule from Walmart’s data mining?

A

‘If a customer purchases a Barbie doll, then a customer also purchases a candy bar.’

23
Q

What are some algorithms used to generate association rules?

A
  • FP-growth algorithm
  • Apriori algorithm
  • Eclat algorithm
24
Q

What do the algorithms for generating association rules differ in?

A

Their efficiency in large data sets.

25
Q

Does KNIME disclose the algorithm used by the program?

A

No, KNIME does not disclose the algorithm.