Data Mining Flashcards
Anti-monotonicity
When an item set violates the constraint, so does any of its supersets
Monotoncity
When an item set satisfies the conditions, its superset does so too
Classification
Supervised learning
Clustering
Unsupervised learning
Apriori
Algorithm for frequent item set mining and association rule learning over transactional databases.
Identifies frequent individual items in the database and extends them to larger and larger item sets as kind as those item sets appear sufficiently often.
Naïve Bayesian Classifier
Assumption that features are strongly independent of one another.
K-means clustering
Partitioning method
Each cluster is represented by the centre if the cluster
Partitioning method
Constructing a partition of a database D of n objects into a set of k clusters such that sum of squared distance is minimised
K-means steps
- Partition objects into k non-empty subsets (initial seed point arbitrarily chosen)
- Compute seed points as the centroids of the clusters of the current partition
- Assign each object to the cluster with the nearest seed point
- Repeat Step 2; stop when no more new assignments
PAM
Partitioning Around Medoids
Partitioning Around Medoids
Medoids - Representative objects
- Starts from an initial set of medoids and iteratively replaces one of them by one of the non-medoids if it improves the total distance if the resulting clustering
- Effective for small data sets
MOLAP
Multidimensional OLAP
ROLAP
Relational OLAP
Multidimensional OLAP
The MOLAP storage mode causes the aggregations of the partition and a copy of its source data to be stored in a multidimensional structure in Analysis Services when the partition is processed. This MOLAP structure is highly optimised to maximise query performance. The storage location can be on the computer where the partition is defined or on another computer running Analysis Services. Because a copy of the source data resides in the multidimensional structure, queries can be resolved without accessing the partition’s source data. Query response times can be decreased substantially by using aggregations. The data in the partition’s MOLAP structure is only as current as the most recent processing of the partition.
Relational OLAP
The ROLAP storage mode causes the aggregations of the partition to be stored in indexed views in the relational database that was specified in the partition’s data source. Unlike the MOLAP storage mode, ROLAP does not cause a copy of the source data to be stored in the Analysis Services data folders. Instead, when results cannot be derived from the query cache, the indexed views in the data source is accessed to answer queries. Query response is generally slower with ROLAP storage than with the MOLAP or HOLAP storage modes. Processing time is also typically slower with ROLAP. However, ROLAP enables users to view data in real time and can save storage space when you are working with large datasets that are infrequently queried, such as purely historical data.