Data Mining (P2) Flashcards

1
Q

Data mining

A

The analysis of large datasets to uncover previously unknown patterns, trends or relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

3 pros of data mining

A
  • Helps to understand behaviors and hidden trends/patterns
  • Helps in detecting risk and fraud
  • Helps analyze large amounts of data very quickly
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

3 cons of data mining

A
  • Can reveal insights that compromise anonymity and privacy
  • Expensive
  • Requires a large amount of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Cluster analysis

A

It uses unsupervised learning.

It divides the data into groups (“clusters”) that have similar characteristics without making any assumptions about what the different clusters means.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Classification

A

It uses supervised learning.

A classifier or model is developed and trained with training datasets. When new data is entered the model, a comparison is made against the new data and the known output which is stored in the training datasets. With this comparison, a prediction is made.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Association Analysis

A

It uses unsupervised learning.

Breaks up the data sets into different variables (e.g. gender, age, etc.) and tries to find relationships and dependencies between different variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Link analysis

A

It uses unsupervised learning.

Essentially, it establishes relationships/associations between different entities within a data set. It starts off by deciding what is the criteria to create a link between entities, then sees what entities meet the criteria, which do not, and to what extent.

Often used in a criminal situation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Deviation Detection

A

It uses unsupervised learning.

This data mining technique’s purpose is to detect the anomalies and outliers in a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly