Data Mining Flashcards
What is Data Mining
The process of extracting information from large databases and using it to make decisions
Name two methods of Data Mining
Predictive and Description
What is the prediction method
use some variables to predict unknown or future values of other variables
what is the description method
find human-interpretable patterns that describe the da
What are the 4 basic tasks of data mining
Classification, Regression, Clustering and Association rule discovery
What is Classification
maps data into predefined groups or classes
What is Regression
maps a data item to a real valued prediction variable
What is Clustering
maps data into groups or classes which are defined by the data
What is Association rule discovery
uncover relationships among data
What are the 5 stages of the Data Mining Process
Data Gathering, Data Preparation and Cleansing Pattern Extraction and Discovery Visualisation of the data Analysis and Evaluation of Results
Name two types of learning
deductive and inductive
what is deductive learning
uses existing knowledge to deduce new knowledge. It is from general rules to special cases
what is inductive learning
uses many examples to produce a generalisation of the examples that were given
Name the 3 types of inductive learning
Supervised, Unsupervised and Reinforcement
What is Supervised learning?
Training examples are input-output pairs with informative output
Classification learning is sometimes called supervised, because, in a sense, the
scheme operates under supervision by being provided with the actual outcome for
each of the training examples—the play or don’t play judgment, the lens recommendation, the type of iris, the acceptability of the labor contract.
What is Unsupervised learning?
Training examples are input patterns with no associated output patterns
What is Reinforcement learning?
Training examples are input-output pairs with evaluative output only
Name two types of data values
nominal and real
What is included in data preparation
data selection, data transformation
What is included in data cleansing
Check if/for: free from errors missing data outliers duplicates
Name two examples used for finding patterns in data
Classification and Association rules discovery
Give an example of were classification is used
Fraud Detection
predict fraudulent cases in credit card transactions
Describe the Classification steps
Given a collection of records (training set) - each record contains a set of attributes, one of the attributes is the class.
Find a model for class attribute as a function of the values of other attributes.
Goal: previously unseen records should be assigned a class as accurately as possible.
Describe the Association rules discovery steps
Given a set of records each of which contain some number of items from a given collection;
Produce dependency rules which will predict occurrence of an item based on occurrences of other items.
Goal: previously unseen dependencies in a collection should be identified properly.