CRISP-DM Flashcards

Give understanding of the 6 phases of standard data analysis phases.

1
Q

What does CRISP-DM stand for?

A

Cross-Industry Standard Process for Data Mining

It was developed in 1999 to standardize data mining processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of CRISP-DM?

A

It provides a structured, reproducible, and flexible approach to data mining and analytics projects.

Following CRISP-DM, you can minimize risks, enhance decision-making, and deliver actionable insights more effectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How many phases are in CRISP-DM?

A

6

These phases provide a structured, iterative approach to data science.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

List the 6 phases of CRISP-DM.

A
  1. Business Understanding
  2. Data Understanding
  3. Data Preparation
  4. Modeling
  5. Evaluation
  6. Deployment

These steps ensure a systematic approach to data mining projects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define:

Business Understanding

A

A phase that focuses on understanding the objectives and requirements of the project.

Any good project starts with a deep understanding of the customer’s needs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which phase involves exploring data distributions and relationships?

A

Data Understanding

This phase includes data collection, description, exploration, and quality verification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the primary goal of the Data Preparation phase?

A

To:

  • Clean data
  • Transform data
  • Structure the data

This phase is also known as data wrangling or data munging.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True or False:

The Data Preparation phase is often the shortest phase.

A

FALSE

It is usually the longest phase, sometimes taking up 80% of the project.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 4 tasks in the Data Understanding phase?

A
  • Collect initial data
  • Describe data
  • Explore data
  • Verify data quality

Each task ensures that the data is suitable for analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Fill in the blank:

The ________ phase involves selecting and applying machine learning techniques.

A

Modeling

This phase includes model selection, training, and assessment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which phase involves splitting the dataset into training and test sets?

A

Modeling

This is necessary for model validation and performance evaluation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does the Evaluation phase differ from Model Assessment?

A

Evaluation focuses on business goals, while Model Assessment focuses on technical performance

Evaluation determines if the model meets business objectives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True or False:

Once a model is built, it does not need further refinement.

A

FALSE

Models often need iteration and tuning before deployment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the 3 tasks in the Evaluation phase?

A
  • Evaluate results
  • Review process
  • Determine next steps

This phase ensures that the best model is selected and validated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is the Deployment phase important?

A

It ensures that the model is used in real-world applications

Deployment can be as simple as a report or as complex as an enterprise-wide system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the main purpose of the ‘Assess Situation’ task in Business Understanding?

A

To evaluate resources, risks, and requirements

This helps create a realistic project plan.

17
Q

Which CRISP-DM phase includes data integration and feature engineering?

A

Data Preparation

This phase prepares data for effective modeling.

18
Q

Fill in the blank:

________ OR _________ scaling is often used in the Data Preparation phase for numerical features.

A

Normalization OR Standardization

Different techniques apply depending on the distribution of data.

19
Q

What is the final step before moving to Deployment?

A

Determine next steps (from the Evaluation phase)

This decision considers whether the model is ready or needs further refinement.

20
Q

Which CRISP-DM phase involves defining business success criteria?

A

Business Understanding

Business success criteria ensure the project delivers value beyond technical metrics.

21
Q

What is the key difference between Business Understanding and Data Understanding?

A

Business Understanding focuses on objectives; Data Understanding focuses on the dataset

Both phases are foundational to a successful data mining project.

22
Q

What are the main challenge in the Data Preparation phase?

A
  • Handling missing values
  • Outliers
  • Inconsistent data

Poor data quality can significantly impact model performance.

23
Q

True or False:

The Modeling phase always produces one final model.

A

FALSE

Multiple models are often tested and compared before selecting the best one.

24
Q

What are the 4 tasks in the Deployment phase?

A
  • Plan deployment
  • Plan monitoring & maintenance
  • Produce final report
  • Review project

Deployment ensures that the model continues to work in production.

25
Q

Which CRISP-DM phase helps ensure data reliability?

A

Data Understanding (Verify data quality task)

Data quality issues should be documented and corrected before modeling.

26
Q

What is the purpose of the ‘Review Process’ task in the Evaluation phase?

A

To check if all steps were properly executed and nothing was overlooked

This ensures that the project is well-documented and meets business needs.

27
Q

Which phase includes risk assessment and cost-benefit analysis?

A

Business Understanding

This phase helps justify resource allocation and feasibility.

28
Q

What is the importance of Model Monitoring in Deployment?

A

To ensure the model continues to perform well over time

Model drift can degrade accuracy, requiring retraining or tuning.

29
Q

How does CRISP-DM support iterative improvements?

A

By allowing teams to revisit previous phases as needed

Data science is rarely linear—feedback loops are essential for improvement.

30
Q

True or False:

CRISP-DM applies only to machine learning projects.

A

FALSE

It applies to all data mining and analytics projects, even without ML.