data mining Flashcards
1
Q
what is data mining
A
extraction of info or patterns from data in large databases
2
Q
big data must account for:
A
- volume
- variety
- velocity
- veracity
3
Q
pac learning idea
A
any hypothesis h that is consistent with a sufficiently large number of training examples, is unlikely to be seriously wrong
4
Q
what are the 7 steps of KDD
A
- Develop an understanding of the application domain (task topic, prior knowledge)
- Creating a target dataset on which the discovery is going to be performed (selection)
- Data cleaning and pre-processing (can take 60% of the effort)
- Data reduction and projection: finding the useful features to represent data (transformation)
- Matching the goals of the process with a data mining method (regression, classification, etc.)
- Exploratory analysis and model/hypothesis selection, choosing the data mining method
- Data mining: searching for patterns of interest
- Interpreting mined patterns
- Acting on/using of discovered knowledge