Introduction Data Science Terminology Flashcards
1
Q
CRISP-DM
A
Cross-industry standard process for data mining
2
Q
CRISP-DM
6 steps
A
Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
3
Q
KDD (Knowledge Discovery in Databases)
Process
A
Data --> Selection Target Date --> Preprocessing Preprocessed Data --> Transformation Transformed Data --> Data Mining Patterns --> Interpretation/ Evaluation Knowledge
4
Q
PDCA
A
PDCA (Plan–Do–Check–Act)
methodology by William Deming
5
Q
DMAIC
A
DMAIC (Define, Measure, Analyze,
Improve and Control) methodology
used in Six Sigma projects
6
Q
Extract, Transform, Load (ETL)
A
IS --> extract raw data --> transform data warehouse (predefined structure) --> load analytics
7
Q
Extract, Load, Transform (ELT)
A
IS --> extract & load transform (data lake raw data & prepared data on demand) --> analytics
8
Q
Another 80/20 rule
A
• 80% of the data scientist’s time is spent on finding,
cleaning, preprocessing and organizing data, leaving
only 20% to actually perform an analysis.
• However, the 20% effort determines 80% of the final
result.