Lecture 1 - CRISP-DM Overview Flashcards

Question 1

Q

What does CRISP-DM stand for?

Answer

A

Cross Industry Standard Process for Data Mining

Question 2

Q

What are the phases of CRISP-DM

Answer

A

Project understanding 2. Data Understading 3. Data preparation 4. Modeling 5. Evaluation 6. Deployment

Question 3

Q

What is done during Project Understanding phase?

Answer

A

Problem formulation (objectives, benefits, assumptions).
Mapping the problem formulation to a data analysis task
Understanding the situation

Question 4

Q

What does 80-20 mean for Project Understanding and data understanding?

Answer

A

20% of time spent for project and data understanding, but importance for success is 80%.

Question 5

Q

What are the problems for project understanding?

Answer

A

Communication problems between domain and data analysis experts
Project owner may not understand
Analysts often don’t understand the domain well
Project owner may not understand the produced models or how to use them
Formulating goals can be difficult

Question 6

Q

How to define the project objective?

Answer

A

Objective: Increase revenues
Deliverable: Software that automatically selects customers for emailing
Success criteria: revenue increases by 5%

Question 7

Q

What does assessing the situation mean?

Answer

A

Finding out the
- requirements and constraints (laws, ethical issues like gender and race, technical constraints)
- assumptions (representativeness, informativeness, data quality, presence of external factors)

Question 8

Q

How to determine analysis goal?

Answer

A

The primary objective must be transformed into a more technical data mining goal.
- Determine data mining tasks (classification, regression, clustering, associations)
- Predictive accuracy?
- Model flexibility
- Interpretability?
- Runtime
- Interestingess and use of expert knowledge
- Should be understandable for the user

Question 9

Q

What are some ethical issues for data analysis?

Answer

A

Anonymizing is difficult
Discriminate (sex, religion, race)
Yet same information is ok for medical applications

Question 10

Q

What are the four risks in European Artificial Intelligence Act (AI Act)?

Answer

A

Minimal risk: Spam filters etc. face no obligations
Specific transparency risk: System like chatbots must inform users that they are ineracting with a machine
High risk: AI system such as AI-based medical software or AI systems used for recruitment must comply with strict requirements
Unacceptable risk: AI systems that allow social scoring are clear threat to people’s fundaamental rights and banned

Question 11

Q