Lecture 5 (Data Mining) Flashcards

1
Q

Why Data Mining?

A

More intense competition
Recognition of the value in data sources
Availability of quality data on customers, vendors, transactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definition of Data Mining?

A

The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Mining Characteristics and Objectives?

A

Source of data for DM is often consolidated data warehouse
DM environment is usually a client-server or a Web-based information system architecture
Data is the most critical ingredient for DM which may include soft/unstructured data
The miner is the end user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How data mining works?

A

DM extract patterns from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Types of patterns in data mining?

A

Association
Prediction
Cluster
Sequential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Association methods?

A

Market-basket
Link analysis
Sequence analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Prediction methods?

A

Classification
Regression
Time Series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Segmentation methods?

A

Clustering

Outlier analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Supervised Learning problems?

A

Classification

  • The domain of the target is finite and categorical
  • A classifier must assign a class to an unseen example

Regression

  • The target attribute is formed by infinite values
  • To fit a model to learn the output target attribute as a function of input attributes

Time Series Analysis
- Making predictions in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Unsupervised Learning Problems?

A

Clustering
Association Rules
Pattern Mining
- It is adopted as more general term than frequent pattern mining or association mining

Outlier Detection
- Ot is the process of finding data examples with behaviours that are very different from the expectation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Mining Applications?

A
Customer Relationship Management
Banking and Other Financial
Retailing and Logistics
Manufacturing and Maintenance
Brokerage and Securities Trading
Insurance
Computer Hardware and Software
Science and Engineering
Government and Defense
Homeland security and law enforcement
Travel, entertainment, sports
Healthcare and medicine
Sports, virtually everywhere
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Customer Relationship Management?

A

Maximize return on marketing campaigns
Improve customer retention
Maximize customer value
Identify and treat most valued customers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Banking and Other Financial?

A

Automate the loan application process
Detecting fraudulent transactions
Maximize customer value
Optimizing cash reserves with forecasting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Retailing and Logistics?

A

Optimize inventory levels at different locations
Improve the store layout and sales promotions
Optimize logistics by predicting seasonal effects
Minimize losses due to limited shelf life

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Manufacturing and Maintenance?

A

Predict/prevent machinery failures
Identify anomalies in production systems to optimize the use manufacturing capacity
Discover novel patterns to improve product quality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Brokerage and Securities Trading?

A

Predict changes on certain bond prices
Forecast the direction of stock fluctuations
Assess the effect of events of market movements
Identify and prevent fraudulent activities in trading

17
Q

Insurance?

A

Forecast claim costs for better business planning
Determine the optimal rate plans
Optimize marketing to specific customers
Identify and prevent fraudulent claim activities

18
Q

Data mining process?

A

A manifestation of best practices
A systematic way to conduct DM projects
Moving from Art to Science for DM project
Everybody has a different vision

19
Q

Most common standard processes of Data Mining?

A

CRISP-DM
SEMMA
KDD

20
Q

CRISP-DM?

A

Cross Industry Standard Process for Data Mining

Proposed in 1990s by European consortium

21
Q

Steps of CRISP-DM?

A
Business Understanding
Data Understanding
Data Preparation
Model Building
Testing and Evaluation
Deployment
22
Q

SEMMA?

A
Sample
Explore
Modify
Model
Assess
23
Q

KDD?

A

Knowledge Discovery in Databases

24
Q

Steps to KDD?

A
Data selection
Data cleaning
Data transformation
Data mining
Internalization
25
Q

Examples of Classification Task?

A

Predicting tumor cells as benign or malignant
Classifying credit card transactions as legitimate or fraudulent
Classifying secondary structures of protein as alpha-helix

26
Q

Classification Techniques?

A
Decision tree based methods
Rule-based methods
Neural Networks
Naive Bayes and Bayesian Belief Networks
Support Vector Machines
27
Q

Pros of KNN?

A

Simple
Flexible
Excellent performance on a wide range of tasks

28
Q

Cons of KNN?

A

Time consuming with n training points
Memorization, not learning.
No insight into the domain

29
Q

Assessment Methods for Classification?

A

Predictive accuracy
- Hit rate

Speed
- Model building versus predicting/usage speed
Robustness
Scalability
Interpretability
30
Q

In classification problems, the primary source for accuracy estimation is the?

A

Confusion Matrix