Data Mining (P2) Flashcards

Question 1

Q

Data mining

Answer

A

The analysis of large datasets to uncover previously unknown patterns, trends or relationships

Question 2

Q

3 pros of data mining

Answer

A

Helps to understand behaviors and hidden trends/patterns
Helps in detecting risk and fraud
Helps analyze large amounts of data very quickly

Question 3

Q

3 cons of data mining

Answer

A

Can reveal insights that compromise anonymity and privacy
Expensive
Requires a large amount of data

Question 4

Q

Cluster analysis

Answer

A

It uses unsupervised learning.

It divides the data into groups (“clusters”) that have similar characteristics without making any assumptions about what the different clusters means.

Question 5

Q

Classification

Answer

A

Classification is a supervised learning task where the goal is to assign labels or categories to new data based on patterns learned from training data.

Question 6

Q

Association Analysis

Answer

A

It uses unsupervised learning.

Breaks up the data sets into different variables (e.g. gender, age, etc.) and tries to find relationships and dependencies between different variables.

Question 7

Q

Link analysis

Answer

A

It uses unsupervised learning.

Essentially, it establishes relationships/associations between different entities within a data set. It starts off by deciding what is the criteria to create a link between entities, then sees what entities meet the criteria, which do not, and to what extent.

Often used in a criminal situation.

Question 8

Q

Deviation Detection

Answer

A

It uses unsupervised learning.

This data mining technique’s purpose is to detect the anomalies and outliers in a dataset.

Data Mining (P2) Flashcards

(8 cards)