This is an algorithm for generating association rules. 1. Find all 1-item sets, calculate their coverage and discard any 1-item sets below the predefined min coverage value. 2. Find all 2-item sets by making pairs of 1-item sets and retain only 2-item sets with coverage above the minimum coverage value. 3. Similarly, find all 3-item sets, 4-item sets, and so on. 4. Repeat until no larger item sets are possible. 5. For each item set, produce as many association rules as the number of ways to split the relevant items between the precondition and the conclusion. 6. For each rule, evaluate the confidence and retain if the rule meets the minimum requirement. Note that the algorithm implements the property that, if an item-set isn't frequent, then all its supersets are also not frequent. The algorithm produces k+1 items only from k-item sets. The worst-case running of the algorithm is exponential with respect to the number of attributes.

1. Count each item frequency and sort in descending order. 2. Perform a second pass of the data to create the FP-tree. 3. Recursively process the tree to identify frequent item sets. The tree structure allows for more efficient extracting of association rules from a dataset. Usually, it has better run time than the apriori algorithm.

Weeks 8-9: Association Rules & Temporal Data Mining Flashcards by Henry Cao

Association Rule

An expression in if-then format.

The if is the precondition or antecedent. It has a series of tests.

The then part is the conclusion or consequent. It assigns values to one or more attributes.

Association rules may incorporate any attribute or combination of attributes as the target.

How well did you know this?

Not at all

Perfectly

Bottom-up Generate-and-test

A procedure for generating association rules that frequently occur in the rule’s right-hand side.

How well did you know this?

Not at all

Perfectly

Coverage/Support Count

The number of instances covered by the rule.

How well did you know this?

Not at all

Perfectly

Support

Proportion of instances covered by the rule.

How well did you know this?

Not at all

Perfectly

Confidence/accuracy

Proportion of instances that the rule predicts correctly over all instances satisfying the precondition.

How well did you know this?

Not at all

Perfectly

Apriori Algorithm

This is an algorithm for generating association rules.

Find all 1-item sets, calculate their coverage and discard any 1-item sets below the predefined min coverage value.
Find all 2-item sets by making pairs of 1-item sets and retain only 2-item sets with coverage above the minimum coverage value.
Similarly, find all 3-item sets, 4-item sets, and so on.
Repeat until no larger item sets are possible.
For each item set, produce as many association rules as the number of ways to split the relevant items between the precondition and the conclusion.
For each rule, evaluate the confidence and retain if the rule meets the minimum requirement.

Note that the algorithm implements the property that, if an item-set isn’t frequent, then all its supersets are also not frequent. The algorithm produces k+1 items only from k-item sets. The worst-case running of the algorithm is exponential with respect to the number of attributes.

How well did you know this?

Not at all

Perfectly

FP-Growth Algorithm

Count each item frequency and sort in descending order.
Perform a second pass of the data to create the FP-tree.
Recursively process the tree to identify frequent item sets.

The tree structure allows for more efficient extracting of association rules from a dataset. Usually, it has better run time than the apriori algorithm.

How well did you know this?

Not at all

Perfectly

Header Table

It appears on the left hand side of the FP-tree. It shows the frequencies of individual items satisfying the minimum coverage threshold, sorted in descending order.

How well did you know this?

Not at all

Perfectly

Lift

This metric measures the usefulness of a rule by comparing with respect to random guessing. It compares the confidence of the rule to the confidence of the null rule, which has the same right-hand side, but empty left-hand side.

How well did you know this?

Not at all

Perfectly

Contingency Table

There are two dimensions to this table.

Does the antecedent of the association rule appear in the transaction?
No: Absent
Yes: Present

Doe the consequent of the association rule appear in the transaction?
No: Absent
Yes: Present

The number of transactions is equal to the sum of the four resulting cells. If the cell for antecedent and consequent both being present is higher than the other cells, then this is a good indication that the association rule is good.

How well did you know this?

Not at all

Perfectly

Sequential Analysis/Sequential Pattern Analysis

This method adds the element of time to association analysis. It considers the sequences, not just the associations, of items, where the order in time is important.

As such, the support of an item set also depends on the order of the items. Generalised sequential pattern (GSP) is a suitable algorithm for sequence mining based on the apriori algorithm.

How well did you know this?

Not at all

Perfectly

Link Analysis

Approach for analysing and exploiting relationships and connections between items. It utilises graphs, with items as nodes and relationships as edges. Can be both supervised and unsupervised in training.

Graphs can be directed, undirected, multigraph (when multiple edges between the same pair of nodes are allowed), and connected (if there’s a path between every pair of nodes).

How well did you know this?

Not at all

Perfectly

Degree Centrality

This metric evaluates the relative importance of a node to the graph.

How well did you know this?

Not at all

Perfectly

Closeness Centrality

This metric measures how close a node is to all the other nodes in the graph.

How well did you know this?

Not at all

Perfectly

Time Series

A set of values obtained from measurements over time. A dataset may have a single or multiple time series.

T = (x_1,…,x_n)
x_t = time series value at time t

How well did you know this?

Not at all

Perfectly

Decomposition

Traditional approaches decompose a time series into additive components:

x_t = T_t + C_t + S_t + \epsilon_t

T_t: Trend, variations of low frequency
S: Seasonality, regular predictable changes
C: Cycle, periodic variations
\epsilon_t: Error, captures uncertainty of the model

How well did you know this?

Not at all

Perfectly

Moving Average (MA)

Study These Flashcards

\hat{x}{t+1} = \frac{x{t-k+1}+…+x_{t-1} + x_t}{k}

AutoRegressive (AR)

Study These Flashcards

\hat{x}_{t+1} = \beta_0 + \beta_1 x_1 + \epsilon_t

Exponential Smoothing

Study These Flashcards

\hat{x}_{t+1} = \alpha x_t + (1 - \alpha)\hat{x}_t, for t \ge 1

\alpha = smoothing factor

Representation

Study These Flashcards

This is a challenge with time series data, as time series data reduces the dimensionality o the data.

Similarity

Study These Flashcards

This is a challenge with time series data, as one of the goals to to evaluate how similar a pair of time series are.

Indexing

Study These Flashcards

This is a challenge with time series data, as there needs to be an organising of time series to achieve low memory requirements and allow for faster querying.

Non-Adaptive Representation

Study These Flashcards

Common representation for all parts of a time series.

Adaptive

Study These Flashcards

Local properties are used to construct a non-uniform representation of a time series.

Piecewise Linear Approximation

Approximates a time series using piecewise linear segments.

Discrete Fourier Transformation (DFT)

Any signal can be represented with a finite number of sine/cosine waves.

Singular Value Decomposition (SVD)

Represents the shape in terms of a linear combination of basic shapes, based on the variance of the dataset. In contrast to DFT which is local, SVD is a global transformation.

Segmentation

Given a time series T with a large number n of time points, compute a model with k < n piecewise segments approximating T.

Measuring Similarity

Whole series matching: compare two time series T and T' using their entire length. Subsequence matching: given a large sequence T and a query subsequence Q, find the T's subsequences matching Q.

Euclidean Distance for Time Series

It's the euclidean distance between the points x_t(T) and x_t(T'). This method is used in standardisation, but doesn't account for acceleration and deceleration along the time axis.

Dynamic Time Warping

Warp the time axis of one or both sequences for better alignment.

Longest Common Subsequences

Match two sequences with potentially unmatched elements.

Indexing

Given a time series T and similarity measure D(T,T'), find the most similar time series to T among a set of different time series.

Anomaly Detection

Given a time series and a model of its behaviour, identify subsequences with anomalies.

Motif Discovery

Find every subsequence, called motif, that appears repeatedly in a longer time series.

Survival Analysis or Time-to-Even Analysis

Focuses on finding when something important will happen. Examples: - Churn, the number of customers who stop using a service or a company to move to another company that offers better services and prices for them. - Tenure, the length of time that an individual remains a customer of a company.

Survival Curve

This plot starts at a 100% and keeps decreasing. It's similar in concept to the half life of an atom. Is used to predict the tenure of a customer. Survival curves may or may not reach 0%.

Survival Function

S(t) = P(T > t)

Discrete Survival Time Data

Time is partitioned into continuous non-overlapping time intervals specifried by the time points t_0 = 0 < t_1 < ... < t_{\ell}. The time intervals are (t_0, t_1], (t_1,t_2],...,(t_{\ell - 1}, t_{\ell}]

Hazard for Discrete Survival Time Data

The probability h_i that the subject of interest will fail during (t_{i-1}, t_i], assuming survival at least until t_{i-1}: h_i = P(t_{i-1} < T \le t_i \mid T > t_{i-1}) Hazard functions may increase and decrease over time.

Kaplan_Meier Estimator of Survival Function

\hat{S}(t) = \prod_{i: t_i \le t} \left( 1 - \frac{d_i}{n_i} \right) Kaplan-Meier survival curves decrease over time, with each curve modelling a different leaf in the survival tree.

Survival Trees

A singe tree groups subjects according to their survival behaviour based on covariates.

Weeks 8-9: Association Rules & Temporal Data Mining Flashcards

(42 cards)