Lecture 1 - CRISP-DM Overview Flashcards

1
Q

What does CRISP-DM stand for?

A

Cross Industry Standard Process for Data Mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the phases of CRISP-DM

A
  1. Project understanding 2. Data Understading 3. Data preparation 4. Modeling 5. Evaluation 6. Deployment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is done during Project Understanding phase?

A
  • Problem formulation (objectives, benefits, assumptions).
  • Mapping the problem formulation to a data analysis task
  • Understanding the situation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does 80-20 mean for Project Understanding and data understanding?

A

20% of time spent for project and data understanding, but importance for success is 80%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the problems for project understanding?

A
  • Communication problems between domain and data analysis experts
  • Project owner may not understand
  • Analysts often don’t understand the domain well
  • Project owner may not understand the produced models or how to use them
  • Formulating goals can be difficult
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to define the project objective?

A

Objective: Increase revenues
Deliverable: Software that automatically selects customers for emailing
Success criteria: revenue increases by 5%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does assessing the situation mean?

A

Finding out the
- requirements and constraints (laws, ethical issues like gender and race, technical constraints)
- assumptions (representativeness, informativeness, data quality, presence of external factors)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to determine analysis goal?

A

The primary objective must be transformed into a more technical data mining goal.
- Determine data mining tasks (classification, regression, clustering, associations)
- Predictive accuracy?
- Model flexibility
- Interpretability?
- Runtime
- Interestingess and use of expert knowledge
- Should be understandable for the user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some ethical issues for data analysis?

A
  • Anonymizing is difficult
  • Discriminate (sex, religion, race)
  • Yet same information is ok for medical applications
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the four risks in European Artificial Intelligence Act (AI Act)?

A

Minimal risk: Spam filters etc. face no obligations
Specific transparency risk: System like chatbots must inform users that they are ineracting with a machine
High risk: AI system such as AI-based medical software or AI systems used for recruitment must comply with strict requirements
Unacceptable risk: AI systems that allow social scoring are clear threat to people’s fundaamental rights and banned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly