Machine Learning Flashcards
Inductive learning
generalize from a given set of (training) examples so that accurate predictions can be made about future examples; learn an unknown function
how to represent a “thing” in machine learning
x: example or instance of a specific object; represented by a feature vector; each dimension - feature
feature vector representation
extract a feature vector x, that describes all attribute relevant for an object; each x is a list of (attribute, value) pairs
types of features
numerical features - discrete or continuous
categorical features - no intrinsic ordering
ordinal features - similar to categorical but clear ordering
point in feature vector representation
each example can be interpreted as a point in a D-dimensional feature space, where D is the number of features/attributes
Training set
A training set is a collection of examples (instances), which is the input to the learning process; assume instances are independent and identically distributed. training set = experience given to learning algorithm
idd
independent and identically distributed
Unsupervised learning
training set = x1, …xn; no “teacher” to show how examples should be handled; tasks: clustering, discovery, novelty detection; dimensionality reduction
goal of clustering
group training samples into clusters such that examples in the same cluster are similar, and examples in different clusters are different
Clustering methods
Hierarchical Agglomerative Clustering
K-means Clustering
Mean Shift Clustering
Hierachical Clustering General Idea
initially every point is in its own cluster
find the pair of clusters that are the closest
merge the two into a single cluster
repeat
end result: binary tree
How to measure closeness between 2 clusters (hierarchical clustering)
Single linkage, complete-linkage, average linkage
single-linkage
the shortest distance from any member of 1 cluster to any member of another cluster
complete linkage
the largest distance from any member of 1 cluster to any member of another cluster
average linkage
the average distance between all pairs of members, one from each cluster
How to measure the distance between a pair of examples?
Euclidean, manhattan/city block, hamming
Dendrogram
binary tree resulting from hierarchical clustering; the tree can be cut at any level to produce different numbers of clusters
What factors affect the outcome of hierarchical clustering?
features used, range of values for each feature, linkage method, distance metric used, weight of each feature
K-Means Clustering
Specify the desired number of clusters and use an iterative algorithm to find them
K-means clustering general idea
if have cluster centers, for each point, choose closest center, if have points, choose a cluster center to be the mean/centroid of the points in the cluster; repeat until convergence
K-means algorithm
input: x…xn
select k cluster centers c1, …ck
fore each point x, determine it’s cluster by finding closest cluster center; update cluster centers as centroid (mean)
Distortion
Sum of squared distances of each data point to its cluster center; optimal clustering minimizes distortion (over all possible possible cluster locations/assignments)
How to pick k in k-means clustering
pick number of k to minimize distortion - pick one close to elbow of the distortion curve
Does K-means always terminate?
Yes (finite number of ways of partioning a finite number of points into k groups)
Is K-means guarunteed to find “optimal clustering”?
No, but each iteration will reduce the error/distortion of the clustering
Which local optimum k-means goes to is determined by:
solely by the starting cluster centers
Ways to pick good starting cluster centers
- run multiple times with different starting random cluster centers
- pick starting points based on farthest apart from each other
How to pick number of clusters
difficult problem, heuristics depend on number of points/dimensions
mean shift clustering
choose search window size and initial location; compute centroid; center search window at mean; repeat
mean shift clustering objective:
find the densest region
Supervised learning
learns a function H: x->y in some function family H, such that h(x) predictions the true label y on future data (classification if discrete, regression if continuous)
what is a label (supervised learning)?
the desired prediction on an instance x
what is a class?
discrete label
concept learning
determine from a given set of examples if a given example is or is not an instance of the concept/class/category
positive/negative example (concept learning)
if an example is or isn’t an instance of concept/class/category
Supervised Concept Learning by Induction
Given a training set of positive and negative examples of a concept, construct a description that accurately classifies whether future examples are positive or negative
Nearest Neighbor Classification
Save each training example in feature space; classify a new example by giving it the same classification as its nearest neighbor
How to pick k for nearest neighbor?
split data into training/tuning sets
classify turning set with different k values
pick k that produces least turning-set error
When doesn’t k-NN generalize well?
If the examples in each class are not well clustered
How is distribution from training set important?
If train on set from different distribution from future data, will not be as accurate
Inductive bias
Any knowledge created by generalization from specific facts cannot be proven true, only false; used when one function is chosen over another
Inductive biases commonly used in ML?
restricted hypothesis space bias
preference bias
Restricted hypothesis space bias
allow only certain types of h’s, not aribitrary ones
preference bias
define a metric for comparing h’s so as to determine whether one is better than another