Chapter 2 Flashcards

1
Q

What is the difference between Data mining Algorithm and Data mining Tasks (canonical) ?

A
Data mining tasks are descriptive tasks and predictive tasks such as :
Classification & Class probability estimation 
Regression (value estimation)
Similarity matching
Clustering
Co-occurrence grouping
Profiling (behavior description) 
Link prediction
Data reduction
Casual modeling
Where data mining algorithms are developed to address data mining tasks 
Support Vector Machine
K-Mean algorithm
Naive Bayes algorithm
Tree induction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the goal of Classification task ?

A

It predict for each individual in a population which of set of classes this individual belongs to.

Q. Among all the customers of MegaTelCo which are likely to respond to a given offer ?
Respond is class 1
Not Respond is class 2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between Classification and Scoring or Class probability estimation task ?

A

Instead of class prediction, a score is representing the probability of the likelihood that the individual belongs to each class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the goal of Regression task ?

A

It predict for each individual the numerical value of some variable for that individual.

Q. How much will a given customer use the service ?
The numerical value to be predicted is service usage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between Classification and Regression ?

A

Classification predicts whether something will happen.

Regression predicts how much something will happen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the goal of Similarity matching task ?

A

It attempts to identify similar individuals based on data known about them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the goal of Clustering task ?

A

It attempts to group individuals in a population together by their similarity.

Q. Do out customers form any natural groups or segments ?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the goal of Co-occurrence grouping task ?

A

It attempts to find associations between entities based on transaction involving them.

Q. What items are commonly purchased together ?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between Co-occurrence grouping and Clustering ?

A

Clustering looks at similarity between objects based on the object’s attributes.
Co-occurrence grouping consider similarity of objects based on their appearing together in a transaction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the goal of Profiling task ?

A

It attempts to characterize the typical behavior of an individual, group, or population.
Profiling helps to know if an action fits the profile or not. It is used to establish behavioral norms.

Q. What is the typical cell phone usage of this customer segment ?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the goal of Link prediction task ?

A

It attempts to predict connections between data items, usually by suggesting a link that should exist.

Q. Since you and Jack share 10 friends, maybe you would like to be Jack’s friend.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the goal of Data reduction task ?

A

Attempts to take a large set of data and replace it with smaller set that contains much of the important information that in the large set.

A massive data on consumer movie-viewing preferences can be replaced with data set about the consumer’s genre preferences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the goal of Causal modeling task ?

A

Attempts to help understand what events or actions actually influence others.

We use predictive modeling to target consumers who are more likely to buy an offer. When consumers actually buy the offer.
We want to know was this because the advertisements or that our predictive model was good.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Unsupervised data mining problem ?

A

Unsupervised data mining problem is when no specific purpose or target has been specified for the grouping.

Q, Do our customers naturally fall into different groups?
NO TARGET

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Supervised data mining problem ?

A

Supervised data mining problem is when there is a specific target defined.

Q. Will a customer leave when the contract expires ?
Target: Yes or No

Q. Can we find a group of customers who have high likelihood of cancelling their service when the contract expires ?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the difference between Supervised and Unsupervised data mining problem ?

A

If specific target can be provided, the problem is Supervised.
The results of Supervised are often much more useful.
In supervised, there is a specific purpose for grouping -> predicting the target.
There must be data on the target.

Unsupervised task, produces grouping based on similarities but there is no guarantee that these similarities are useful for any particular purpose.

17
Q

Examples of Classification and Regression problems.

A

Will the customer purchase service Y if given offer X ?
binary target.

Which service package X1, X2, None will the customer purchase if given offer Y ?
Classification problem with three classes.

How much will this customer use the service ?
Regression problem with numeric target.

18
Q

What is the distinction between data mining and data mining results ?

A

Data mining use historical data to find patterns and build model (e.g class probability estimation model).

Historical Data —–> Data Mining —-> Model

Data mining result is applying the model to a new, unseen cases to generate a probability estimate for it.

New data item —–> Model —–> Class: Yes, Probability 0.88

19
Q

Explain The CRISP-DM (Cross Industry Standard Process for Data Mining) process ?

A

It is a process that places a structure on the problem, allowing reasonable consistency, repeatability, and objectiveness.
The process diagram focus on the fact that iteration is the rule rather than the exception.

The CRISP-DM cycle is based around exploration; it iterates on approaches and strategy rather than on software design.

Business understanding Data understanding —–> Data preparation Modeling —–> Evaluation —–> Deployment.
|
|
v
Business understanding

20
Q

Explain The Business understanding step of CRISP-DM (Cross Industry Standard Process for Data Mining) process ?

A

Business understanding:
is how to cast a business problem as one or more data science problems.
Structuring a problem such that one or more problems involves models for classification, regression and so on.
The design team should think carefully about the problem to be solved and about use scenario.

21
Q

Explain The Business understanding step of CRISP-DM (Cross Industry Standard Process for Data Mining) process ?

A

The data is the available raw material for solving the business problem.
In data understanding, we need to dig beneath the surface to uncover the structure of business problem and the data that are available, and then match them to one or more data mining task.

22
Q

Explain The Data preparation step of CRISP-DM (Cross Industry Standard Process for Data Mining) process ?

A

Collected data often require to be in a form different from how the data are provided naturally, and some conversion will be needed.
Data preparation proceeds along with data understanding.

23
Q

What is a leak in data preparation ?

A

A leak is when a variable collected in historical data gives information on the target variable that appears in historical data but it is not available when the decision has to be made.

24
Q

Explain The Modeling step of CRISP-DM (Cross Industry Standard Process for Data Mining) process ?

A

Modeling is the primary place where data mining techniques are applied to the data.

25
Q

Explain The Evaluation step of CRISP-DM (Cross Industry Standard Process for Data Mining) process ?

A

The goal of Evaluation is to assess the data mining results. It also help to ensure that the model meets the original business goals.
The evaluation stage may reveal that results are not good enough to deploy and we need to adjust the problem definition or get new data.

26
Q

Explain The Deployment step of CRISP-DM (Cross Industry Standard Process for Data Mining) process ?

A

In deployment, the results of data mining and the data mining techniques are put into real use in order to realize some return on investment (ROI).

27
Q

What is the main focus of Data mining ?

What is the difference between data mining and other analytics techniques ?

A

Data mining focuses on the automated search for knowledge, patterns, or regularities from data.