Chapter 8 Flashcards
supervised learning(classification)
the training data (observations, measurments, etc.) are accompanied by labels indicating the class of the observations new data is classified based on the training set
Unsupervised learning (clusterng)
THe class labels of training data is unknown Given a set of measurments, observations, etc. with the aim of establishing the existence of classes or clusters in the data.
Classification
predicts categorical calss labesl (discrete or nominal) Classifies data based on the training set and the vaslues (class labels) in a classifying attribute and uses it in classifying new data
Numeric prediction
models continous-valued functions, i.e., predicts unknown or missing values credit loan aproval medical diagnosis fruad detection web page categorization
Learning step (model construction)
describing a ser of predetermined classes
each tuple/sample is assumed to belong to a predefined class, as determined by the class ;abe; attribute
the setr of tuples ued for model construction is the traning set
Classification step (model usage)
for classifying future or unknown objects
Estimate accuracy
the known label of test sample is compared with the classified reult from the model
acurracy rate
is the percentage of test set samples thgat are correctly classified by the model
test set
independent of traning set (otherwise overfitting)
ckassify new data
if the accuracy is acceptable
validation test set
if the test set is used to select modeles
decision tree induction
the learning of decison trees form class labled training tuples
internal node
denotes a test on an attriubute
branch
represents an outcome of the test
leaf node or terminal node
holds a class label
root node
top most node in a tree
d-data partition
initially it is the complete ser of training tuples and their associated class labels
attribute-list
list ofattrivutes describes the tuples
attribute_selection_method
a heuristic procedure for selecting the attribute that best disctiminates the given tuples according to class
gini index
enforce the resulting tree to be binary
information cange
allows multiway splits
determine splitting criterion
idelly the resulting partitions at each branch are as pure as possible
pure partition
when all the tuples in it belong to the same class
discrete valued partitioning
each variable has a partition or line to it
continous valued partition
there is a slit point resaulting in two partitions either below or over the split
discrete-valued and a binary tree must be produced
A question is asked and the outcome is the partition or line
attribute selection measures
splitting rules