Lecture 5 (Data Mining) Flashcards by Deivis D

Why Data Mining?

More intense competition
Recognition of the value in data sources
Availability of quality data on customers, vendors, transactions

How well did you know this?

Not at all

Perfectly

Definition of Data Mining?

The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases.

How well did you know this?

Not at all

Perfectly

Data Mining Characteristics and Objectives?

Source of data for DM is often consolidated data warehouse
DM environment is usually a client-server or a Web-based information system architecture
Data is the most critical ingredient for DM which may include soft/unstructured data
The miner is the end user

How well did you know this?

Not at all

Perfectly

How data mining works?

DM extract patterns from data

How well did you know this?

Not at all

Perfectly

Types of patterns in data mining?

Association
Prediction
Cluster
Sequential

How well did you know this?

Not at all

Perfectly

Association methods?

Market-basket
Link analysis
Sequence analysis

How well did you know this?

Not at all

Perfectly

Prediction methods?

Classification
Regression
Time Series

How well did you know this?

Not at all

Perfectly

Segmentation methods?

Clustering

Outlier analysis

How well did you know this?

Not at all

Perfectly

Supervised Learning problems?

Classification

The domain of the target is finite and categorical
A classifier must assign a class to an unseen example

Regression

The target attribute is formed by infinite values
To fit a model to learn the output target attribute as a function of input attributes

Time Series Analysis
- Making predictions in time

How well did you know this?

Not at all

Perfectly

Unsupervised Learning Problems?

Clustering
Association Rules
Pattern Mining
- It is adopted as more general term than frequent pattern mining or association mining

Outlier Detection
- Ot is the process of finding data examples with behaviours that are very different from the expectation

How well did you know this?

Not at all

Perfectly

Data Mining Applications?

Customer Relationship Management
Banking and Other Financial
Retailing and Logistics
Manufacturing and Maintenance
Brokerage and Securities Trading
Insurance
Computer Hardware and Software
Science and Engineering
Government and Defense
Homeland security and law enforcement
Travel, entertainment, sports
Healthcare and medicine
Sports, virtually everywhere

How well did you know this?

Not at all

Perfectly

Customer Relationship Management?

Maximize return on marketing campaigns
Improve customer retention
Maximize customer value
Identify and treat most valued customers

How well did you know this?

Not at all

Perfectly

Banking and Other Financial?

Automate the loan application process
Detecting fraudulent transactions
Maximize customer value
Optimizing cash reserves with forecasting

How well did you know this?

Not at all

Perfectly

Retailing and Logistics?

Optimize inventory levels at different locations
Improve the store layout and sales promotions
Optimize logistics by predicting seasonal effects
Minimize losses due to limited shelf life

How well did you know this?

Not at all

Perfectly

Manufacturing and Maintenance?

Predict/prevent machinery failures
Identify anomalies in production systems to optimize the use manufacturing capacity
Discover novel patterns to improve product quality

How well did you know this?

Not at all

Perfectly

Brokerage and Securities Trading?

Predict changes on certain bond prices
Forecast the direction of stock fluctuations
Assess the effect of events of market movements
Identify and prevent fraudulent activities in trading

Insurance?

Forecast claim costs for better business planning
Determine the optimal rate plans
Optimize marketing to specific customers
Identify and prevent fraudulent claim activities

Data mining process?

A manifestation of best practices
A systematic way to conduct DM projects
Moving from Art to Science for DM project
Everybody has a different vision

Most common standard processes of Data Mining?

CRISP-DM
SEMMA
KDD

CRISP-DM?

Cross Industry Standard Process for Data Mining

Proposed in 1990s by European consortium

Steps of CRISP-DM?

Business Understanding
Data Understanding
Data Preparation
Model Building
Testing and Evaluation
Deployment

SEMMA?

Sample
Explore
Modify
Model
Assess

KDD?

Knowledge Discovery in Databases

Steps to KDD?

Data selection
Data cleaning
Data transformation
Data mining
Internalization

Examples of Classification Task?

Predicting tumor cells as benign or malignant Classifying credit card transactions as legitimate or fraudulent Classifying secondary structures of protein as alpha-helix

Classification Techniques?

``` Decision tree based methods Rule-based methods Neural Networks Naive Bayes and Bayesian Belief Networks Support Vector Machines ```

Pros of KNN?

Simple Flexible Excellent performance on a wide range of tasks

Cons of KNN?

Time consuming with n training points Memorization, not learning. No insight into the domain

Assessment Methods for Classification?

Predictive accuracy - Hit rate ``` Speed - Model building versus predicting/usage speed Robustness Scalability Interpretability ```

In classification problems, the primary source for accuracy estimation is the?

Confusion Matrix