CRISP-DM Flashcards
Give understanding of the 6 phases of standard data analysis phases.
What does CRISP-DM stand for?
Cross-Industry Standard Process for Data Mining
It was developed in 1999 to standardize data mining processes.
What is the purpose of CRISP-DM?
It provides a structured, reproducible, and flexible approach to data mining and analytics projects.
Following CRISP-DM, you can minimize risks, enhance decision-making, and deliver actionable insights more effectively.
How many phases are in CRISP-DM?
6
These phases provide a structured, iterative approach to data science.
List the 6 phases of CRISP-DM.
- Business Understanding
- Data Understanding
- Data Preparation
- Modeling
- Evaluation
- Deployment
These steps ensure a systematic approach to data mining projects.
Define:
Business Understanding
A phase that focuses on understanding the objectives and requirements of the project.
Any good project starts with a deep understanding of the customer’s needs.
Which phase involves exploring data distributions and relationships?
Data Understanding
This phase includes data collection, description, exploration, and quality verification.
What is the primary goal of the Data Preparation phase?
To:
- Clean data
- Transform data
- Structure the data
This phase is also known as data wrangling or data munging.
True or False:
The Data Preparation phase is often the shortest phase.
FALSE
It is usually the longest phase, sometimes taking up 80% of the project.
What are the 4 tasks in the Data Understanding phase?
- Collect initial data
- Describe data
- Explore data
- Verify data quality
Each task ensures that the data is suitable for analysis.
Fill in the blank:
The ________ phase involves selecting and applying machine learning techniques.
Modeling
This phase includes model selection, training, and assessment.
Which phase involves splitting the dataset into training and test sets?
Modeling
This is necessary for model validation and performance evaluation.
How does the Evaluation phase differ from Model Assessment?
Evaluation focuses on business goals, while Model Assessment focuses on technical performance
Evaluation determines if the model meets business objectives.
True or False:
Once a model is built, it does not need further refinement.
FALSE
Models often need iteration and tuning before deployment.
What are the 3 tasks in the Evaluation phase?
- Evaluate results
- Review process
- Determine next steps
This phase ensures that the best model is selected and validated.
Why is the Deployment phase important?
It ensures that the model is used in real-world applications
Deployment can be as simple as a report or as complex as an enterprise-wide system.
What is the main purpose of the ‘Assess Situation’ task in Business Understanding?
To evaluate resources, risks, and requirements
This helps create a realistic project plan.
Which CRISP-DM phase includes data integration and feature engineering?
Data Preparation
This phase prepares data for effective modeling.
Fill in the blank:
________ OR _________ scaling is often used in the Data Preparation phase for numerical features.
Normalization OR Standardization
Different techniques apply depending on the distribution of data.
What is the final step before moving to Deployment?
Determine next steps (from the Evaluation phase)
This decision considers whether the model is ready or needs further refinement.
Which CRISP-DM phase involves defining business success criteria?
Business Understanding
Business success criteria ensure the project delivers value beyond technical metrics.
What is the key difference between Business Understanding and Data Understanding?
Business Understanding focuses on objectives; Data Understanding focuses on the dataset
Both phases are foundational to a successful data mining project.
What are the main challenge in the Data Preparation phase?
- Handling missing values
- Outliers
- Inconsistent data
Poor data quality can significantly impact model performance.
True or False:
The Modeling phase always produces one final model.
FALSE
Multiple models are often tested and compared before selecting the best one.
What are the 4 tasks in the Deployment phase?
- Plan deployment
- Plan monitoring & maintenance
- Produce final report
- Review project
Deployment ensures that the model continues to work in production.
Which CRISP-DM phase helps ensure data reliability?
Data Understanding (Verify data quality task)
Data quality issues should be documented and corrected before modeling.
What is the purpose of the ‘Review Process’ task in the Evaluation phase?
To check if all steps were properly executed and nothing was overlooked
This ensures that the project is well-documented and meets business needs.
Which phase includes risk assessment and cost-benefit analysis?
Business Understanding
This phase helps justify resource allocation and feasibility.
What is the importance of Model Monitoring in Deployment?
To ensure the model continues to perform well over time
Model drift can degrade accuracy, requiring retraining or tuning.
How does CRISP-DM support iterative improvements?
By allowing teams to revisit previous phases as needed
Data science is rarely linear—feedback loops are essential for improvement.
True or False:
CRISP-DM applies only to machine learning projects.
FALSE
It applies to all data mining and analytics projects, even without ML.