Data Mining (P2) Flashcards
Data mining
The analysis of large datasets to uncover previously unknown patterns, trends or relationships
3 pros of data mining
- Helps to understand behaviors and hidden trends/patterns
- Helps in detecting risk and fraud
- Helps analyze large amounts of data very quickly
3 cons of data mining
- Can reveal insights that compromise anonymity and privacy
- Expensive
- Requires a large amount of data
Cluster analysis
It uses unsupervised learning.
It divides the data into groups (“clusters”) that have similar characteristics without making any assumptions about what the different clusters means.
Classification
It uses supervised learning.
A classifier or model is developed and trained with training datasets. When new data is entered the model, a comparison is made against the new data and the known output which is stored in the training datasets. With this comparison, a prediction is made.
Association Analysis
It uses unsupervised learning.
Breaks up the data sets into different variables (e.g. gender, age, etc.) and tries to find relationships and dependencies between different variables.
Link analysis
It uses unsupervised learning.
Essentially, it establishes relationships/associations between different entities within a data set. It starts off by deciding what is the criteria to create a link between entities, then sees what entities meet the criteria, which do not, and to what extent.
Often used in a criminal situation.
Deviation Detection
It uses unsupervised learning.
This data mining technique’s purpose is to detect the anomalies and outliers in a dataset.