Lecture 1 - CRISP-DM Overview Flashcards
What does CRISP-DM stand for?
Cross Industry Standard Process for Data Mining
What are the phases of CRISP-DM
- Project understanding 2. Data Understading 3. Data preparation 4. Modeling 5. Evaluation 6. Deployment
What is done during Project Understanding phase?
- Problem formulation (objectives, benefits, assumptions).
- Mapping the problem formulation to a data analysis task
- Understanding the situation
What does 80-20 mean for Project Understanding and data understanding?
20% of time spent for project and data understanding, but importance for success is 80%.
What are the problems for project understanding?
- Communication problems between domain and data analysis experts
- Project owner may not understand
- Analysts often don’t understand the domain well
- Project owner may not understand the produced models or how to use them
- Formulating goals can be difficult
How to define the project objective?
Objective: Increase revenues
Deliverable: Software that automatically selects customers for emailing
Success criteria: revenue increases by 5%
What does assessing the situation mean?
Finding out the
- requirements and constraints (laws, ethical issues like gender and race, technical constraints)
- assumptions (representativeness, informativeness, data quality, presence of external factors)
How to determine analysis goal?
The primary objective must be transformed into a more technical data mining goal.
- Determine data mining tasks (classification, regression, clustering, associations)
- Predictive accuracy?
- Model flexibility
- Interpretability?
- Runtime
- Interestingess and use of expert knowledge
- Should be understandable for the user
What are some ethical issues for data analysis?
- Anonymizing is difficult
- Discriminate (sex, religion, race)
- Yet same information is ok for medical applications
What are the four risks in European Artificial Intelligence Act (AI Act)?
Minimal risk: Spam filters etc. face no obligations
Specific transparency risk: System like chatbots must inform users that they are ineracting with a machine
High risk: AI system such as AI-based medical software or AI systems used for recruitment must comply with strict requirements
Unacceptable risk: AI systems that allow social scoring are clear threat to people’s fundaamental rights and banned