week 6 Flashcards
chp 6
How human and machine learns?
(similarity and differences)
Similarities
- Learn - For humans, it’s a natural cognitive
process, while for machines, it’s a function of
their programming. - Experience - Humans learn from life experiences,
while machines learn from data they are exposed
to. - Feedback - Humans improve based on feedback
from teachers, peers, or results, and machines
improve based on feedback loops programmed
into their algorithms (like reinforcement
learning).
Differences
- General vs. specific learning - Humans are
capable of general learning and can apply
knowledge in varied contexts, whereas machines
typically learn specific tasks and are not as
flexible. - Small vs. big data - Humans can learn effectively
from small amounts of data, while machines
require large datasets to learn effectively. - Learning speed - Machines can process vast
amounts of information quickly once they are
properly trained, while human learning is a
slower,
Designing
a Learning
System
choosing training experience -> choosing target function -> choosing representation of target function -> choosing function approximation -> final design
ML model :
1)supervised learning
2)unsupervised learning
3)reinforcement learning
1)supervised learning
-learn from labeled data
-decision tree, svm
2)unsupervised learning
-learn from unlabelled data
-K-Means Clustering
3)Reinforcement Learning
-Learn to make decison through reward by continuous trial and error
- Markov decision process
Type of supervised learning-classification, regression
TYPES OF SUPERVISED LEARNING
Classification
*It involves a type of problem
where the output variable is a
category or categorical such as
“positive or negative”, “living
things or non living things”.
Problems that involve predicting a
category output:
● Image classification
○ Medical image classification
● Text classification
○ Sentiment analysis
○ Spam detection
Regression
*It involves a type of problem
where the output variable is a
real value or quantitative such as
weight, budget, etc.
Problems that involve predicting a
numerical output:
● Predicting stock price
● Predicting a house’s sale
price based on its square
footage
● Revenue forecasting
Today: Support Vector Machine (SVM)
A classifier derived from statistical learning theory by Vapnik, et al. in 1992
SVM became famous when, using images as input, it gave accuracy comparable
to neural-network with hand-designed features in a handwriting recognition task
Currently, SVM is widely used in object detection & recognition, content-based
image retrieval, text recognition, biometrics, speech recognition, etc.
Also used for regression (will not cover today)
Outline
Linear Discriminant function
Large Margin Linear Classifier
Nonlinear SVM:The Kernel Trick
Demo of SVM
Linear SVMs:
Overview
- The classifier is a separating hyperplane.
- Most “important” training points are support
vectors; they define the hyperplane. - Quadratic optimization algorithms can identify
which training points xi are support vectors with
non-zero Lagrangian multipliers αi
. - Both in the dual formulation of the problem
and in the solution training points appear only
inside inner products:
Non-linear SVMs
- Datasets that are linearly separable
with some noise work out great: - But what are we going to do if the
dataset is just too hard? - How about… mapping data to a
higher-dimensional space:
Non-linear SVMs: Feature Spaces
- General idea: the original feature space can always be mapped to
some higher-dimensional feature space where the training set is
separable:
The “Kernel Trick”
- The linear classifier relies on inner product between vectors
KK xxii, xxjj = xxii
TTxxjj
- If every datapoint is mapped into high-dimensional space via
some transformation φφ: xx ⟶ φφ(xx), the inner product becomes:
KK xxii, xxjj = φφ xxii
TTφφ(xxjj)
- A kernel function is a function that is equivalent to an inner product
in some feature space. Example:
2-dimensional vectors xx = xx1xx2 ; let KK xxii, xxjj = (1 + xxii
TTxxjj)2
Need to show that KK xxii, xxjj = φφ xxii
TTφφ(xxjj)
KK xxii, xxjj = (1 + xxii
TTxxjj)2 = 1 + xxii
2 xxjj
2 + 2xxii xxjj xxii xxjj + xxii
2 xxjj
2 + 2xxii xxjj + 2xxii xxjj
= [1 xxii
2 2xxii xxii xxii
2 2xxii 2xxii ]
TT [1 xxjj
2 2xxjj xxjj xxjj
2 2xxjj 2xxjj ]
= φφ xxii
TTφφ(xxjj) , where φφ xx = [1 xx1
2 2xx1xx2 xx2
2 2xx1 2xx2]
- Thus, a kernel function implicitly maps data to a high-dimensional
space (without the need to compute each φ(x) explicitly).
SVM Applications
- SVMs were originally proposed by Boser, Guyon and Vapnik in 1992 and gained
increasing popularity in late 1990s. - SVMs are currently among the best performers for a number of classification
tasks ranging from text to genomic data. - SVMs can be applied to complex data types beyond feature vectors (e.g. graphs,
sequences, relational data) by designing kernel functions for such data. - SVM techniques have been extended to a number of tasks such as regression
[Vapnik et al. ’97], principal component analysis [Schölkopf et al. ’99], etc. - Most popular optimization algorithms for SVMs use decomposition to hill-
climb over a subset of αi
’s at a time, e.g. SMO [Platt ’99] and [Joachims’99]
- Tuning SVMs remains a black art: selecting a specific kernel and
parameters is usually done in a try-and-see manner.
TYPES OF
UNSUPERVISED
LEARNING
Clustering
*Technique that involves grouping
similar objects or data points
together into clusters, based on
certain features or
characteristics.
Dimensionality
Reduction
*It involves a type of problem
where the output variable is
a real value or quantitative
such as weight, budget, etc.
CLUSTERING
ALGORITHM
UNSUPERVISED LEARNING ALGORITHM
Clustering is a machine learning technique that
involves grouping similar objects or data points
together into clusters, based on certain features or
characteristics.
* Important in various industries, such as marketing,
healthcare, finance, and more.
* The main goals of clustering are to identify natural
groupings in data, minimize the distance or
dissimilarity between data points within the same
cluster, and maximize the distance or dissimilarity
between data points in different clusters.
SIMILARITY AND DISSIMILARITY
MEASURES FOR CLUSTERING (Euclidean Distance, Manhattan Distance, Cosine Similarity)
- Euclidean Distance: This is a common measure of distance
between two points in a multidimensional space. It is
calculated as the square root of the sum of the squared
differences between corresponding coordinates of the two
points. - Manhattan Distance: This is another measure of distance
between two points, based on the absolute differences
between corresponding coordinates of the two points. It is
often used in image recognition or pattern matching. - Cosine Similarity: This is a measure of similarity between
two vectors, based on the cosine of the angle between
them. It is often used in natural language processing or
recommendation systems.
CLUSTERING ALGO: K-MEANS
- K-Means is a widely used algorithm for clustering that aims to
partition a dataset into K distinct clusters, where K is a pre-specified
number. - Effective for well-separated clusters and can handle different types of
data, such as numerical or categorical variables. - Requires the pre-specification of the number of clusters, which may
not be known in advance or may be difficult to determine.
Steps of K-Means Algorithm:
- Initialization: Select K random data points from the dataset
as the initial centroids for the K clusters.
2.Assignment: Assign each data point to the cluster whose
centroid is closest to it, based on a distance measure such as
Euclidean distance or Manhattan distance.
3.Update: Recalculate the centroid of each cluster as the mean
of all the data points assigned to it in the previous step. - Convergence: Repeat steps 2 and 3 until the centroids
converge, that is, until the change in the centroids is below a
certain threshold or the maximum number of iterations is
reached.