Chapter 2 Flashcards
What is the difference between Data mining Algorithm and Data mining Tasks (canonical) ?
Data mining tasks are descriptive tasks and predictive tasks such as : Classification & Class probability estimation Regression (value estimation) Similarity matching Clustering Co-occurrence grouping Profiling (behavior description) Link prediction Data reduction Casual modeling
Where data mining algorithms are developed to address data mining tasks Support Vector Machine K-Mean algorithm Naive Bayes algorithm Tree induction
What is the goal of Classification task ?
It predict for each individual in a population which of set of classes this individual belongs to.
Q. Among all the customers of MegaTelCo which are likely to respond to a given offer ? Respond is class 1 Not Respond is class 2
What is the difference between Classification and Scoring or Class probability estimation task ?
Instead of class prediction, a score is representing the probability of the likelihood that the individual belongs to each class.
What is the goal of Regression task ?
It predict for each individual the numerical value of some variable for that individual.
Q. How much will a given customer use the service ?
The numerical value to be predicted is service usage.
What is the difference between Classification and Regression ?
Classification predicts whether something will happen.
Regression predicts how much something will happen.
What is the goal of Similarity matching task ?
It attempts to identify similar individuals based on data known about them.
What is the goal of Clustering task ?
It attempts to group individuals in a population together by their similarity.
Q. Do out customers form any natural groups or segments ?
What is the goal of Co-occurrence grouping task ?
It attempts to find associations between entities based on transaction involving them.
Q. What items are commonly purchased together ?
What is the difference between Co-occurrence grouping and Clustering ?
Clustering looks at similarity between objects based on the object’s attributes.
Co-occurrence grouping consider similarity of objects based on their appearing together in a transaction.
What is the goal of Profiling task ?
It attempts to characterize the typical behavior of an individual, group, or population.
Profiling helps to know if an action fits the profile or not. It is used to establish behavioral norms.
Q. What is the typical cell phone usage of this customer segment ?
What is the goal of Link prediction task ?
It attempts to predict connections between data items, usually by suggesting a link that should exist.
Q. Since you and Jack share 10 friends, maybe you would like to be Jack’s friend.
What is the goal of Data reduction task ?
Attempts to take a large set of data and replace it with smaller set that contains much of the important information that in the large set.
A massive data on consumer movie-viewing preferences can be replaced with data set about the consumer’s genre preferences.
What is the goal of Causal modeling task ?
Attempts to help understand what events or actions actually influence others.
We use predictive modeling to target consumers who are more likely to buy an offer. When consumers actually buy the offer.
We want to know was this because the advertisements or that our predictive model was good.
What is Unsupervised data mining problem ?
Unsupervised data mining problem is when no specific purpose or target has been specified for the grouping.
Q, Do our customers naturally fall into different groups?
NO TARGET
What is Supervised data mining problem ?
Supervised data mining problem is when there is a specific target defined.
Q. Will a customer leave when the contract expires ?
Target: Yes or No
Q. Can we find a group of customers who have high likelihood of cancelling their service when the contract expires ?