Data Mining - Lecture Introduction Flashcards
What is data mining?
The creative process that provides results useful for decision making.
This process can include statistics, machine learning and programming among others, and is often used in big data.
What are the 5 V’s of big data?
- Volume
- Velocity
- Variety
- Veracity
- Value
What is meant by volume in big data?
The amount of data
What is meant by velocity in big data?
The speed at which data is being generated and changed.
What is meant by variety in big data?
The different types of data being generated (text, dates, numbers etc.)
What is meant by veracity in big data?
The accuracy or truthfullness of a dataset
What is meant by value in big data?
Data only has value is it is turned into something useful
What is an algorithm?
A specific procedure used to implement a specific data mining technique.
What is nominal data?
Data that serve as labels (often textual). There is no ordering in the values and there is no ranking.
What is ordinal data?
Nominal data with an order between the values.
hot > mild > cool
What is interval data?
Data with numerical values where there is an order and there are set and specific intervals between the values. There is no defined zero point.
What is ratio data?
Data with numerical values, where there is an order and there are set and specific intervals between the values. There is also a defined zero point.
What are the 10 steps in the data mining process?
- Develop an understanding of the purpose of the project.
- Obtain the dataset to be used in analysis
- Explore, clean and preprocess the data
- Reduce the data dimension, if necessary.
- Determine the data mining task
- Partition the data (for supervised tasks)
- Choose the data mining technique
- Use algorithms to perform the task
- Interpret the results of the algorithms
- Deploy the model (run it on real records)
What is CRISP-DM?
Cross Industry Standard Process for Data Mining.
Also gives steps.
- Business understanding
- Data understanding
- Data preparation
- Model Building
- Testing and Evaluating
- Deployment
What is SEMMA?
Also gives steps for the process.
- Sample
- Explore
- Modify
- Model
- Assess
Is a cycle. Similar to CRISP’s step 4-5-6