Part 2: Chapter 1, 2, 3 Flashcards
Data
Collection of facts lowes level of abstrations.
Nominal vs Ordinal
Colors vs low/high.
What is CRISP-DM?
Cross Industry Standard Process for Datamining.
Which 6 steps is CRISP-DM made out of?
Business understanding Data understanding Data preparation Model building Testing and evaluation Deployment
(First three steps take total of 85% of time)
What is SEMMA? (Last three steps of CRISP-DM)
Sample, Explore, Modify, Model, Assess.
What is datamining?
The process of discovering new valuable knowledg in databases.
Which two types of datamining are there?
Hypothesis-driven (classical statistical)
Discovery-driven (exploring while looking at data)
What are two similarities between DM and statistics?
Scientific fields
Complex processes for learning from data
What are four differences
Purpose of data (mostly operational processes)
Amount of data (Big Data)
Data analysis (DM result should be easy to understand)
5v’s of Big Data
- Volume
- Variety
- Velocity
- Varacity
- Value
Three principles of logic.
- Deduciton
- Abduction
- Induction.
What is deduction?
All birds fly, koko bird, koko fly.
What is abduction?
All birds fly, koko fly, therefore koko bird.
What is induction?
Koko bird, koko fly, tweety bird, tweety fly, all bird fly.
What is selection bias?
Rejecting inference in different communication channels.
Make sure that youre dataset is a good representation of the real world.