KDD Process Steps Flashcards
What are the steps in KDD process? (1st)
Data Cleaning:
What is Data Cleaning?
Removal of noisy and irrelevant data from datasets
- Cleaning in case of Missing values.
- Cleaning noisy data, where noise is a random or variance error.
- Cleaning with Data discrepancy detection and Data transformation tools.
What are the steps in KDD process? (2nd)
Data Integration
What are the steps in KDD process? (3rd)
Data Selection
What are the steps in KDD process? (4th)
Data Transformation
What are the steps in KDD process? (5th)
Data Mining
What are the steps in KDD process? (6th)
Pattern Evaluation
What are the steps in KDD process? (7th)
Knowledge representation or Data Visualisation
What is Data Integration?
heterogeneous data from multiple sources combined in a common source (DataWarehouse).
What does Data Integration consist of?
- Data Migration tools
- Data Synchronization tools
- ETL(Extract-Transform-Load) process
What is Data Selection?
The process where data relevant to the analysis is decided and retrieved from the data collection
What does Data selection uses?
- Neural network.
- Decision Trees.
- Naive bayes.
- Clustering, Regression, etc.
What is Data Transformation?
The process of transforming data into appropriate form required by mining procedure.
What are the Data Transformation process consist of?
Data Mapping: Assigning elements from source base to destination to capture transformations.
Code generation: Creation of the actual transformation program.
What is Data Mining?
Clever techniques that are applied to extract patterns potentially useful.
- Extraction of interesting potentially useful patterns
or knowledge from huge amount of data
What does data mining do?
- Transforms task relevant data into patterns.
2. Decides purpose of model using classification or characterization.
What is Pattern Evaluation?
Identifying strictly increasing patterns representing knowledge based on given measures.
What does Pattern Evaluation do?
- Find interestingness score of each pattern.
2. Uses summarization and Visualization to make data understandable by user.
What is Knowledge representation?
It is a technique which utilizes visualization tools to represent data mining results.
How does knowledge representation work?
- Generate reports
- Generate tables
- Generate discriminant rules, classification rules, characterization rules, etc
Additional Information
- KDD is an iterative process where evaluation measures can be enhanced, mining can be refined, new data can be integrated and transformed in order to get different and more appropriate results.
- Preprocessing of databases consists of Data cleaning and Data Integration.