Data Mining Flashcards
What is the first step in the CRISP-DM framework?
Business Understanding
What is the second step in the CRISP-DM framework?
Data Understanding
What is the third step in the CRISP-DM framework?
Data Preparation
What is the fourth step in the CRISP-DM framework?
Modelling
What is the fifth step in the CRISP-DM framework?
Evaluation
What is the final step in the CRISP-DM framework?
Deployment
What are the most important steps in the CRISP-DM model?
Data understanding and prep
What is a key aspect of data preparaton?
Data reduction
What does data reduction do?
Removes unnecessary and misleading data
Reduces time taken for discovering knowledge
Improves quality of discovered knowledge
What are the main techniques in data reduction?
Feature selection
Instance selection
What is the definition of classification?
Given a set of (training) data, we find a model for the class feature as a function of the values of the other features
What is the goal of classification?
That new instances (i.e real data) are assigned a class as accurately as possible
What is an example of preprocessing?
When the data is images, remove subject of interest from background
What may cause errors in classification?
Insufficient training data
Too few or too many features
Overfitting (learning too much from training)
How does the k-NN algorithm work?
First locate the nearest k instances with Euclidean distance
Take a vote amongst those if discrete answer, otherwise mean of those values