5.4 Quiz: Descriptive Analytics (EN) Flashcards
Given the transaction database below, how many association rules have a minimal support of 50% and a minimal confidence of 75%? Use the apriori principle.
Transaction Item
1 A,B,C
2 A,B,E
3 A,B,C,D,E
4 A,B,C,D
5 A,D,E,F
6 B,C,F
7 C
8 A,B,C
7
Given the fact that the support of association rule {A,B} -> {C} equals 37.5% and the confidence of the same rule equals 50%, which of the following transactions must be the eight transaction in the database?
Transaction Item
1 A,B,F
2 A,B,C
3 A,B,D
4 C
5 A,D,E,F
6 A,B,C,D,E
7 A,B,C,D
8 ???
ABCD
AB
C
BCDE
AB
Support(X) = (Transactions containing X) / (Total transactions)
Confidence(X -> Y) = Support(X ∪ Y) / Support(X)
In association rule mining, the percentage of total transactions that contains item set is called
the support
the confidence
the lift
support
The support of an itemset is the percentage (or proportion) of total transactions in the dataset that contain that particular itemset.
The confidence of a rule (X -> Y) is the percentage of transactions containing
The lift of a rule (X -> Y) is a measure of how much more likely Y is to be bought when X is bought compared to the likelihood of Y being bought in general.
Sequence rules aim at finding
inter-transaction patterns.
intra transaction patterns.
inter-transaction patterns.
In association rule mining, the Apriori property states:
every superset of a frequent item set is frequent.
every subset of a frequent item set is infrequent.
every superset of an infrequent item set is frequent.
every subset of frequent item set is frequent.
very Subset of a Frequent Itemset is Frequent:
The Apriori property is a fundamental concept in association rule mining, and it helps in efficiently discovering frequent itemsets in a dataset.
A dendrogram can be used to
measure the similarity between clusters.
to decide upon the optimal number of clusters.
to decide upon the optimal number of clusters.
In order to decide on the optimal number of clusters, a dendrogram can be used. This is a tree-like diagram that records the sequences of merges. Let’s illustrate this with an example. Assume that we want to cluster birds in terms of their characteristics, such as the way they look, the noise they make, what they eat, and where they live. You can see the clustering process illustrated here. First, we group chicken and duck, then parrot and canary. Step 3 adds pigeon to the chicken and duck cluster. Step 4 clusters owl and eagle. Step 5 merges the clusters obtained in steps 2 and 3 and step 6 merges everything together. An obvious question is where we should stop clustering?
Consider the association rule X ==> Y. The measure support(X U Y)/(support(X) × suppport(Y)) is called the
the support.
the confidence.
the lift.
the lift.
the lift is the measure represented by the given formula, and it assesses the significance of the association rule by comparing the observed support of the rule with the expected support under the assumption of independence.(?)
Given the following transactions database:
Listener Artists
1 Tsjaikovski, Eminem, Måneskin, The Weeknd
2 Ariana Grande, Olivia Rodrigo, Beyoncé
3 Beyoncé, Ed Sheeran, Måneskin, Ariana Grande
4 Ed Sheeran, Beyoncé, Ariana Grande
5 Ed Sheeran, Måneskin
6 Eminem, The Weeknd, Måneskin
7 The Weeknd, Olivia Rodrigo
8 Måneskin, Ariana Grande, Eminem, The Weeknd
9 Ariana Grande, Ed Sheeran, Olivia Rodrigo
The association rule Beyoncé à Ed Sheeran, Måneskin has:
a support of 3/9 and a confidence of 2/3.
a support of 3/9 and a confidence of 1/3.
a support of 1/9 and a confidence of 2/3.
a support of 1/9 and a confidence of 1/3.
a support of 1/9 and a confidence of 1/3.
Support(Beyoncé -> Ed Sheeran, Måneskin) = Number of transactions containing (Beyoncé and Ed Sheeran and Måneskin) / Total number of transactions
Support(Beyoncé -> Ed Sheeran, Måneskin) = 1 / 9
Confidence(Beyoncé -> Ed Sheeran, Måneskin) = Support(Beyoncé and Ed Sheeran and Måneskin) / Support(Beyoncé)
Support(Beyoncé) = Number of transactions containing Beyoncé / Total number of transactions
Confidence(Beyoncé -> Ed Sheeran, Måneskin) = (1 / 9) / (3 / 9)
= 1 / 3
Association rules aim at finding
inter-transaction patterns.
intra transaction patterns.
intra transaction patterns
association rules are concerned with what items appear together at the same time (intra-transaction patterns), sequence rules are concerned about what items appear at different times (inter-transaction patterns).
Association rules typically address co-occurrence patterns within individual transactions.
Sequence rules, on the other hand, address patterns that involve the order of events or items across different transactions.
Which statement is CORRECT?
In the single linkage method, the distance between two clusters is defined as the shortest distance between any two members in both clusters.
The complete linkage method defines the distance between two clusters as the maximum distance between any two members in both clusters.
The average linkage method calculates the average distance between all members in both clusters.
The centroid method calculates the distance between both cluster centroids.
All statements are correct.
All statements are correct.
Descriptive analytics is also referred to as
supervised learning.
unsupervised learning.
unsupervised learning.
Descriptive analytics is also referred to as unsupervised learning since there is no target variable available to steer the learning process. The idea is to find structure in an unlabeled data set. Common techniques are clustering, association rules and sequence rules
Given the following transactions database:
Listener Artists
1 Tsjaikovski, Eminem, Måneskin, The Weeknd
2 Ariana Grande, Olivia Rodrigo, Beyoncé
3 Beyoncé, Ed Sheeran, Måneskin, Ariana Grande
4 Ed Sheeran, Beyoncé, Ariana Grande
5 Ed Sheeran, Måneskin
6 Eminem, The Weeknd, Måneskin
7 The Weeknd, Olivia Rodrigo
8 Måneskin, Ariana Grande, Eminem, The Weeknd
9 Ariana Grande, Ed Sheeran, Olivia Rodrigo
The association rule: Eminem, The Weeknd à Måneskin has:
a support of 3/9 and a confidence of 3/9.
a support of 3/9 and a confidence of 1.
a support of 1 and a confidence of 3/9.
a support of 1 and a confidence of 1.
a support of 3/9 and a confidence of 1.
Support(Eminem, The Weeknd -> Maneskin)= 3/9
Confidence(Eminem, The Weeknd -> Maneskin)= (3/9)/(3/9)= 1
Agglomerative and divisive clustering algorithms are examples of
hierarchical clustering.
non-hierarchical clustering.
hierarchical clustering.
Which of the following are post-processing activities in association rule mining?
Filter out the trivial rules that contain already known patterns
(De triviale regels die al bekende patronen bevatten eruit filteren)
Perform a sensitivity analysis by varying the minimum support and minimum confidence values.
Use appropriate visualization facilities (e.g., OLAP-based) to find the unexpected rules that might represent novel and actionable behavior in the data.
Measure the economic impact (e.g., profit, cost) of the association rules.
All of the above.
All of the above.
A lift value bigger than 1 indicates a
negative dependence or substitution effect.
positive dependence or complementary effect.
positive dependence or complementary effect.
The lift value is a measure used in association rule mining to assess the strength of the relationship between two items or sets of items in a dataset. Specifically, it indicates how much more likely the items are to be purchased together compared to what would be expected if their occurrence were independent.
Lift(X-> Y) = Support (XuY) / (SupportX * Support Y)
If the lift is equal to 1, it suggests that the occurrence of X and Y is independent of each other.
If the lift is greater than 1, it indicates a positive dependence or a complementary effect, suggesting that the presence of X increases the likelihood of the presence of Y (and vice versa).
If the lift is less than 1, it indicates a negative dependence or a substitution effect, suggesting that the presence of X decreases the likelihood of the presence of Y (and vice versa).