Yet Another Deck Flashcards
Regression is a data mining task of predicting the value of the target (_) by building a model based on one of more predictors
numerical variable
regression options
* decision tree (freq table) * multiple linear regression (co variance matrix) * k-nearest neighbor (similarity functions) * artificial neural networks (other) * support vector machine (other) Dinosaurs made Kites and Vikings. Natural theory of regression.
The ID3 algorithm can be used to construct as decision tree for regression by
replacing information gain with standard deviation reduction
the standard deviation reduction for ID3 regression is based on the
decrease in standard deviation after a dataset is split on an attribute
constructing an ID3 decision tree is all about finding attribute that returns the
highest standard deviation reduction
when building an ID3 regression decision tree, a branch with standard deviation of more than zero _
requires further splitting
Decision trees. To stop splitting forever we need some termination criteria, for example, when the _ becomes smaller than a certain fraction of the _
standard deviation, standard deviation of for the full dataset (e.g. 5%)
Decision trees. To stop splitting forever we need some termination criteria, for example, when too _
few instances remain in the branch (e.g. 3)
Decision trees. To stop splitting forever we need some termination criteria. Then when the number of instances is more than one at a leaf node we _
calculate the average as the final value for the target
Logistic regression predicts
the probability of an outcome than can only have two values (i.e. a dichotomy)
The prediction for logistic regression is based on the use of one of several predictors (_ & _)
numerical, categorical
A linear regression is not appropriate for predicting the value of a binary variable for two reasons (1) linear regression will
predict values outside the acceptable range (0 to 1)
A linear regression is not appropriate for predicting the value of a binary variable for two reasons (2) since dichotomous experiments can only have one of two possible values for each experiment, the residuals will not
be normally distributed about the predicted line
Logistic regression produces a _ which is _
logistic curve, limited to the values between 0 and 1
logistic regression is similar to linear regression but the curve is constructed using the natural logarithm of the _ of the target variable, rather than the probability
odds
logistic regression is similar to linear regression but the predictors do not
have to be normally distributed or have equal variance in the group
Just as ordinary least square regression is the method used to estimate coefficients for the best fit line in linear regression, logistic regression uses _ to obtain the model coefficients that relate predictors to the targer
maximum likelihood estimation (MLE)
Association rules is a pattern that states when an event occurs _
another event occurs with a certain probability
Most instance-based learners use:
Euclidean distance
Alternative to Euclidean distance
Manhattan, City-Block
It is usual to normalize all attribute values to:
normalize attribute values to between 0-1.
Normalizing Euclidean distance: symbolic attributes (non-numeric). The difference between two different values is usually expressed:
one (mismatch), zero (match)
normalizing Euclidean distance formula - missing attributes are:
taken to be 1 (maximally different)
normalizing Euclidean distance formula: For numeric attributes, the diff between two missing values is also taken as 1. However, if just one value is missing, the distance can be:
taken as (normalized) size of the other value X or 1-X, whichever is larger (as large as possibly)
instance-based learning is slow because you are:
calculating distance from every member of the training set
Nearest neighbors can be found using a:
kD-tree
kD-Tree
Binary tree which divides input space with a hyperplane. k = no# of attributes.
kD-trees: note that the hyperplanes are not _
decision boundaries
Kd-tree: choosing the median value may yield skinny hyperrectangles. Rather,
use the mean & point closest to that.
Kd-tree: instance-based learning advantage; can update it incrementally. To do this for a kd-tree, determine which _ contains the new point and find its _.
leaf node, hyperrectangle
Kd-tree: corners of rectangular regions awkward? Use:
hyperspheres, rather than hyperrectangles
kD-trees do not depend on regions being disjoint, the _ defines k-dimensional hyperspheres (“_”) that cover the data points, and _
ball tree, balls, arranges them into a tree
Ball Tree: regions can overlap, but points in the overlap are assigned to _
only one of the overlapping balls
Nearest-neighbor instance-based learning: k-nearest-neighbor: k of nearest neighbors: determine class using:
a majority vote
kD Trees: worthwhile only when the attribute number is small:
up to 10