Data Mining Introductions Flashcards
is the science of extracting useful knowledge from huge data repositories.
Data Mining
is an open standard process model.
CRISP-DM REFERENCE MODEL
(Cross Industry Standard Process for Data Mining)
6 TASKS IN CRISP-DM REFERENCE MODEL
- Business Understanding
- Data Understanding
- Data Preparation
- Modeling
- Evaluation
- Deployment
2 DATA MINING METHODS
- Descriptive Method
- Predictive Method
is a method where we find human-interpretable patterns that describe the data.
Descriptive Method
is a method that uses some feature (variables) to predict unknown or future value of other variable.
Predictive Metho
5 DATA MINING TASKS
- Clustering
- Association Rule Discovery
- Regression
- Classification
- Deviation / Anomaly Detection
is a type of data mining task that predicts value of a given continuous valued variable based on the values of other variables.
Regression
is a type of data mining task that detects significant deviation from normal behavior.
Deviation / Anomaly Detection
5 CHALLENGES OF DATA MINING
- Scalability
- Dimensionality
- Complexity and Heterogenous Data
- Data Quality
- Data Ownership and Privacy
3 TYPES OF TOOLS DATA MINING
- Simple Graphical User Interface
- Process Oriented
- Programming Oriented
2 COMMON PROGRAMMING ORIENTED TOOLS
- R
- Python
4 INFO ABOUT DATA WAREHOUSE
- Subject Oriented
- Integrated
- Nonvolatile
- Time Variant
data warehouses are designed to help you analyzed data.
Subject Oriented
integrates data from disparate sources into a consistent format.
Integrated
data in the data warehouse are never overwritten or deleted.
Nonvolatile
maintains both historical and (nearly) current data.
Time Variant
EXPLAIN EXTRACT, TRANSFORM, LOAD
- Extracting the data from outside sources
- Transforming data to fit analytical needs
- Loading data into the data warehouse.
is a term for data sets that are so large or complex that traditional data processing application are inadequate to deal with them.
Big Data
4 CHARACTERTISTICS OF BIG DATA
- Variety
- Veracity
- Velocity
- Volume
is a characteristic of big data that means there are different forms of data.
Variety
is a characteristic of big data that means the uncertainty of the data.
Veracity
is a characteristic of big data that means the analysis of data.
Velocity
is a characteristic of big data that means the scale of data.
Volume