Week 3 - unsupervised learning Flashcards
Unsupervised learning
- 2 kenmerken
- 2 common tasks
- example
No Labeled Guidance*
Exploration and Patterns*
Common Tasks:
Common tasks in unsupervised learning include clustering and dimensionality reduction.
Clustering*
Examples:
Imagine you have a collection of articles, and you want the algorithm to group them into topics without telling it what the topics are. Unsupervised learning can be used to discover natural themes or clusters within the articles.
In image analysis, unsupervised learning might be applied to identify common patterns or features without specifying what to look for.
Unsupervised learning:
K-means clustering
4 steps
Initialization*
Assignment*
Update Centroids*
Repeat*
Example: Let’s say you have data representing the heights and weights of people. You want to group them into clusters based on their physical characteristics.
Initialization:
- example
Example: Let’s say you have data representing the heights and weights of people. You want to group them into clusters based on their physical characteristics.
Assignment:
- example
Example: Let’s say you have data representing the heights and weights of people. You want to group them into clusters based on their physical characteristics.
Update centroids:
Example
Example: Let’s say you have data representing the heights and weights of people. You want to group them into clusters based on their physical characteristics.
Repeat
- Example
Example: Let’s say you have data representing the heights and weights of people. You want to group them into clusters based on their physical characteristics.
- result
After the algorithm converges, you might find clusters where people with similar heights and weights are grouped together.
Exploration and Patterns
The algorithm explores the data to identify inherent patterns, similarities, or structures on its own.
It tries to understand the natural organization or grouping within the data.
Clustering
Grouping similar data points together
Dimensionality Reduction
Simplifying the data while retaining its essential features.
K-means clustering:
Clusters
A cluster is a group of data points that are similar to each other.
The idea is to find natural groupings in the data without knowing beforehand what those groups are.
K-means clustering:
K
“k” is the number of clusters you want to identify in the data.
You need to decide or specify the value of k before running the algorithm.
K means clustering:
Initialization
Randomly select k points as the initial cluster centroids (centers).
K means clustering:
Update Centroids
Assign each data point to the cluster whose centroid is the closest (based on distance metrics like Euclidean distance).
K means clustering:
Recalculate the centroids of each cluster based on the mean of the data points in that cluster.