Pensum Flashcards
What are the two main machine learning techniques for data mining?
Supervised machine learning and unsupervised machine learning
What are examples of supervised machine learning?
Decision trees, linear classifiers, linear regression
What are examples of unsupervised machine learning?
Clustering
What are characteristics with unsupervised machine learning?
No specific target value for unsupervised methods. System is just looking for pattern in the data but not acting like “a teacher”. Data can be grouped very nicely into a small number of categories. We just have to look for the result.
What are the goal with clustering?
Goal is to group together similar instances using some metric of similarity - so create groupings where the members of a given group are similar to each other. For example group similar customers together and design different campaigns.
What are characteristics with clustering?
It is light classification but the groupings are not predefined. More open ended than classification and regression. Could find a way to group similar customers together. May or may not relate to the churn question.
What are the fundamental goal of data mining techniques?
Exploration to find patterns in dataset.
What is similarity matching?
Instances are compared based on their attributes to determine how similar they are. Amazon - find books that are similar to a book you have read. The most similar will be a book with all three attributes (if there were three in the one you already read).
When do we use similarity matching?
The general idea of similarity matching placeable in many different forms of data mining including classification, regression, and clustering
What is important with similarity matching?
Important to have information about the relevant attributes. And information about which one attributes is most important.
What is regression?
Numerical value. Related to classification but there is a difference. Classification predicts wether there is going to happen something. Regression predicts how much.
Give an example of when we would use regression.
How much will a customer spend? that will be solved with regression.
What does supervised machine learning do in general?
A target value specified for each instance. Examining instances one by one. We can simply compute how often the system makes the right choice.
What is classification?
Classification involves defining a small number of classes and then trying to predict for each instance, which class they belong to. In churn example classification is a natural one - one for will churn and one for will not churn Each instance is labelled with a target value indicating what class it belongs to.
What is data preparation?
Data preparation is about constructing a dataset from one or more data sources to be used for exploration and modeling