week 6 Flashcards

chp 6

1
Q

How human and machine learns?
(similarity and differences)

A

Similarities

  • Learn - For humans, it’s a natural cognitive
    process, while for machines, it’s a function of
    their programming.
  • Experience - Humans learn from life experiences,
    while machines learn from data they are exposed
    to.
  • Feedback - Humans improve based on feedback
    from teachers, peers, or results, and machines
    improve based on feedback loops programmed
    into their algorithms (like reinforcement
    learning).

Differences

  • General vs. specific learning - Humans are
    capable of general learning and can apply
    knowledge in varied contexts, whereas machines
    typically learn specific tasks and are not as
    flexible.
  • Small vs. big data - Humans can learn effectively
    from small amounts of data, while machines
    require large datasets to learn effectively.
  • Learning speed - Machines can process vast
    amounts of information quickly once they are
    properly trained, while human learning is a
    slower,
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Designing
a Learning
System

A

choosing training experience -> choosing target function -> choosing representation of target function -> choosing function approximation -> final design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

ML model :
1)supervised learning
2)unsupervised learning
3)reinforcement learning

A

1)supervised learning
-learn from labeled data
-decision tree, svm

2)unsupervised learning
-learn from unlabelled data
-K-Means Clustering

3)Reinforcement Learning
-Learn to make decison through reward by continuous trial and error
- Markov decision process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Type of supervised learning-classification, regression

A

TYPES OF SUPERVISED LEARNING

Classification
*It involves a type of problem
where the output variable is a
category or categorical such as
“positive or negative”, “living
things or non living things”.

Problems that involve predicting a
category output:
● Image classification
○ Medical image classification
● Text classification
○ Sentiment analysis
○ Spam detection

Regression
*It involves a type of problem
where the output variable is a
real value or quantitative such as
weight, budget, etc.

Problems that involve predicting a
numerical output:
● Predicting stock price
● Predicting a house’s sale
price based on its square
footage
● Revenue forecasting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Today: Support Vector Machine (SVM)

A

A classifier derived from statistical learning theory by Vapnik, et al. in 1992

SVM became famous when, using images as input, it gave accuracy comparable
to neural-network with hand-designed features in a handwriting recognition task

Currently, SVM is widely used in object detection & recognition, content-based
image retrieval, text recognition, biometrics, speech recognition, etc.

Also used for regression (will not cover today)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Outline
Linear Discriminant function
Large Margin Linear Classifier
Nonlinear SVM:The Kernel Trick
Demo of SVM

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Linear SVMs:
Overview

A
  • The classifier is a separating hyperplane.
  • Most “important” training points are support
    vectors; they define the hyperplane.
  • Quadratic optimization algorithms can identify
    which training points xi are support vectors with
    non-zero Lagrangian multipliers αi
    .
  • Both in the dual formulation of the problem
    and in the solution training points appear only
    inside inner products:
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Non-linear SVMs

A
  • Datasets that are linearly separable
    with some noise work out great:
  • But what are we going to do if the
    dataset is just too hard?
  • How about… mapping data to a
    higher-dimensional space:

Non-linear SVMs: Feature Spaces

  • General idea: the original feature space can always be mapped to
    some higher-dimensional feature space where the training set is
    separable:
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The “Kernel Trick”

A
  • The linear classifier relies on inner product between vectors

KK xxii, xxjj = xxii
TTxxjj

  • If every datapoint is mapped into high-dimensional space via
    some transformation φφ: xx ⟶ φφ(xx), the inner product becomes:

KK xxii, xxjj = φφ xxii

TTφφ(xxjj)

  • A kernel function is a function that is equivalent to an inner product
    in some feature space. Example:
    2-dimensional vectors xx = xx1xx2 ; let KK xxii, xxjj = (1 + xxii

TTxxjj)2

Need to show that KK xxii, xxjj = φφ xxii

TTφφ(xxjj)

KK xxii, xxjj = (1 + xxii

TTxxjj)2 = 1 + xxii
2 xxjj
2 + 2xxii xxjj xxii xxjj + xxii
2 xxjj
2 + 2xxii xxjj + 2xxii xxjj

= [1 xxii

2 2xxii xxii xxii

2 2xxii 2xxii ]

TT [1 xxjj

2 2xxjj xxjj xxjj

2 2xxjj 2xxjj ]

= φφ xxii

TTφφ(xxjj) , where φφ xx = [1 xx1

2 2xx1xx2 xx2

2 2xx1 2xx2]

  • Thus, a kernel function implicitly maps data to a high-dimensional
    space (without the need to compute each φ(x) explicitly).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

SVM Applications

A
  • SVMs were originally proposed by Boser, Guyon and Vapnik in 1992 and gained
    increasing popularity in late 1990s.
  • SVMs are currently among the best performers for a number of classification
    tasks ranging from text to genomic data.
  • SVMs can be applied to complex data types beyond feature vectors (e.g. graphs,
    sequences, relational data) by designing kernel functions for such data.
  • SVM techniques have been extended to a number of tasks such as regression
    [Vapnik et al. ’97], principal component analysis [Schölkopf et al. ’99], etc.
  • Most popular optimization algorithms for SVMs use decomposition to hill-
    climb over a subset of αi

’s at a time, e.g. SMO [Platt ’99] and [Joachims’99]

  • Tuning SVMs remains a black art: selecting a specific kernel and
    parameters is usually done in a try-and-see manner.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

TYPES OF
UNSUPERVISED
LEARNING

A

Clustering
*Technique that involves grouping
similar objects or data points
together into clusters, based on
certain features or
characteristics.

Dimensionality
Reduction
*It involves a type of problem
where the output variable is
a real value or quantitative
such as weight, budget, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

CLUSTERING
ALGORITHM

UNSUPERVISED LEARNING ALGORITHM

A

Clustering is a machine learning technique that
involves grouping similar objects or data points
together into clusters, based on certain features or
characteristics.
* Important in various industries, such as marketing,
healthcare, finance, and more.
* The main goals of clustering are to identify natural
groupings in data, minimize the distance or
dissimilarity between data points within the same
cluster, and maximize the distance or dissimilarity
between data points in different clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

SIMILARITY AND DISSIMILARITY
MEASURES FOR CLUSTERING (Euclidean Distance, Manhattan Distance, Cosine Similarity)

A
  • Euclidean Distance: This is a common measure of distance
    between two points in a multidimensional space. It is
    calculated as the square root of the sum of the squared
    differences between corresponding coordinates of the two
    points.
  • Manhattan Distance: This is another measure of distance
    between two points, based on the absolute differences
    between corresponding coordinates of the two points. It is
    often used in image recognition or pattern matching.
  • Cosine Similarity: This is a measure of similarity between
    two vectors, based on the cosine of the angle between
    them. It is often used in natural language processing or
    recommendation systems.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

CLUSTERING ALGO: K-MEANS

A
  • K-Means is a widely used algorithm for clustering that aims to
    partition a dataset into K distinct clusters, where K is a pre-specified
    number.
  • Effective for well-separated clusters and can handle different types of
    data, such as numerical or categorical variables.
  • Requires the pre-specification of the number of clusters, which may
    not be known in advance or may be difficult to determine.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Steps of K-Means Algorithm:

A
  1. Initialization: Select K random data points from the dataset
    as the initial centroids for the K clusters.
    2.Assignment: Assign each data point to the cluster whose
    centroid is closest to it, based on a distance measure such as
    Euclidean distance or Manhattan distance.
    3.Update: Recalculate the centroid of each cluster as the mean
    of all the data points assigned to it in the previous step.
  2. Convergence: Repeat steps 2 and 3 until the centroids
    converge, that is, until the change in the centroids is below a
    certain threshold or the maximum number of iterations is
    reached.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

K-MEANS ALGORITHM - INITIALIZATION

A
  • The initialization step is the first step in the K-Means clustering algorithm, and it involves
    selecting the initial centroids for the K clusters.
  • One common method for initialization is random selection, where K data points are randomly
    selected from the dataset to serve as the initial centroids.
  • Example:
  • As a starting point, you tell your model how many clusters it should make. First the model
    picks up K, (let K = 3) datapoints from the dataset. These datapoints are called cluster
    centroids.
  • Now there are different ways you to initialize the centroids, you can either choose them at
    random — or sort the dataset, split it into K portions and pick one datapoint from each
    portion as a centroid.
17
Q

K-MEANS ALGORITHM –
ASSIGNMENT STEP

A
  • Involves assigning each data point to the nearest centroid
    based on its distance.
  • The distance metric used to measure the distance between
    data points and centroids can affect the resulting clusters
  • Most common distance metric used in K-Means clustering
    is the Euclidean distance

DD xxii, xxiii = xxii − xxiii = �
jj=1
pp
xxii − xxii′jj
2

  • Clusters defined by Euclidean distance is invariant to
    translations and rotations in feature space, but not
    invariant to scaling of features.
18
Q

K-MEANS ALGORITHM
– UPDATE STEP

A

The update step is the third step in the K-Means
clustering algorithm, and it involves updating the
centroids of each cluster based on the mean of the data
points assigned to that cluster.
* The most common method used for updating centroids
in K-Means clustering is the simple mean, which
calculates the mean of all the data points assigned to
each cluster.
* Because the initial centroids were chosen arbitrarily,
your model the updates them with new cluster values.
* If some other algo, like K-Mode, or K-Median was used,
instead of taking the average value, mode and median
would be taken respectively.

19
Q

K-MEANS ALGORITHM – CONVERGENCE
CRITERIA

A
  • The convergence criteria is the final step in the K-Means clustering
    algorithm, and it determines when the algorithm should stop
    iterating and the clusters are considered stable.
  • The common convergence criteria –
  • the algorithm will stop after a predetermined number of iterations, regardless of whether
    the clusters have stabilized or not.
  • minimum change in centroids, where the algorithm will stop iterating when the centroids of
    all clusters change by less than a certain threshold value between consecutive iterations.
  • minimum change in the sum of squared errors (SSE) or the silhouette score.
20
Q

CLUSTERING ALGO – HIERARCHICAL
CLUSTERING

A
  • Type of clustering algorithm that creates a tree-like structure of
    clusters by recursively dividing or merging clusters based on the
    similarity between data points.
  • Two main types:
  • Agglomerative
  • Divisive
21
Q

CLUSTERING ALGO:
SPECTRAL CLUSTERING

A
  • Represent datapoints as the
    vertices V of a graph G.
  • All pairs of vertices are connected by
    an edgeE.
  • Edges have weights W.
  • – Large weights mean that the
    adjacent vertices are very similar;
    small weights imply dissimilarity.
22
Q

GRAPH PARTITIONING

A

Clustering on a graph is equivalent to partitioning the vertices of the
graph.
* A loss function for a partition of V into sets A and B

cccc AA, BB = �
uu∈ ,vv∈BB

WWuu,vv

  • In a good partition, vertices in different partitions will be dissimilar.
  • Mincut criterion: Find partition AA, BB that minimizes cccc AA, BB)
23
Q

SPECTRAL CLUSTERING

A
  • In spectral clustering, the data points are treated as nodes of a graph.
  • Spectral clustering involves 3 steps:
    1. Compute a similarity graph
    2. Project the data onto a low-dimensional space
    3. Create clusters
24
Q

SPECTRAL CLUSTERING – STEP 1

A
  • We first create an undirected graph G = (V, E) with vertex set V = {v1,
    v2, …, vn} = 1, 2, …, n observations in the data.
  • This can be represented by an adjacency matrix which has the
    similarity between each vertex as its elements.
  • To perform that, we can opt to do any of these options:
  • The ε-neighborhood graph
  • KNN graph
  • Fully connected graph
25
Q

SPECTRAL CLUSTERING – STEP 2

A
  • Step 2 — Project the data onto a low-dimensional space:
  • Our goal then is to transform the space so that when the 2 points are
    close, they are always in same cluster, and when they are far apart,
    they are in different clusters.
  • Compute the Graph Laplacian, which is just another matrix
    representation of a graph and can be useful in finding interesting
    properties of a graph.
26
Q

SPECTRAL CLUSTERING
– STEP 3

A
  • Step 3 — Create clusters
  • For this step, we use the eigenvector corresponding to the 2nd
    eigenvalue to assign values to each node.
  • On calculating, the 2nd eigenvalue is 0.189 and the
    corresponding eigenvector v2 = [0.41, 0.44, 0.37, -0.4, -0.45, -
    0.37].
  • To get bipartite clustering (2 distinct clusters), we first assign
    each element of v2 to the nodes such that {node1:0.41 ,
    node2:0.44 , … node6: -0.37}.
  • Then split the nodes such that all nodes with value > 0 are in
    one cluster, and all other nodes are in the other cluster.
  • Thus, in this case, we get nodes 1, 2 & 3 in one cluster, and 4,
    5 & 6 in the 2nd cluster.