lec 2(done) Flashcards
Data Mining Functionalities:
1-Class/concept description
2-Mining frequent patterns, associations, and correlations
3-Classification and regression for predictive analysis
4-Cluster analysis
5-Outlier analysis
Data characterization:
Summarization of the general characteristics or features of a target class of data.
Data discrimination:
Comparison of the general features of the target class against one or a set of contrasting classes.
Frequent Patterns:
patterns that occur frequently in data.
Association Analysis:
Mining frequent patterns leads to the discovery of interesting associations and correlations within data.
Frequent Patterns and Associations applications:
1-Marketing and Sales Promotion.
2-Supermarket shelf management.
3-Inventory Management.
Classification:
Construct a model (function) based on some training examples to describe and distinguish data classes or concepts for future prediction.
Classification predicts categorical (discrete) labels.
Typical methods for data classification:
Decision trees, naïve Bayesian classification, support vector machines, neural networks, classificationrules (i.e., IF-THEN rules), logistic regression, …
Regression
is used to predict numerical (continuous) values.
Applications of Classification and Prediction:
Credit card fraud detection, direct marketing, classifying diseases..
Predicting wind velocity, temperature, sales amount of a product, stock market,…
Cluster analysis:
- Unsupervised learning (Class label is unknown)
- Group data to form new categories (i.e., clusters)
Cluster analysis Applications:
1-Cluster houses to find distribution patterns.
2-Document clustering.
Outlier:
A data object that does not comply with the general behavior of the data (noise or exception)
Useful in fraud detection, rare events analysis
Major Issues in Data Mining:
1-Mining Methodology:
-Mining various and new kinds of knowledge.
-Mining knowledge in multi-dimensional space.
-Data mining: An interdisciplinary effort.
-Handling noise, uncertainty, and incompleteness of data.
-Pattern evaluation.
2-User Interaction:
-Incorporation of background knowledge.
-Presentation and visualization of data mining results.
3-Efficiency and Scalability:
- Efficiency and scalability of data mining algorithms.
- Parallel, distributed, and incremental mining methods.
4-Diversity of data types:
- Handling complex types of data.
- Mining dynamic, networked, and global data repositories.
5-Data mining and society:
- Social impacts of data mining.
- Privacy-preserving data mining.