Week 4 - Segmentation and Clustering p1,2 Flashcards

1
Q

What is the goal of Grouping

A

Gather features that belong together for efficiency of further processing
Obtain an intermediate representation that compactly describes key parts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the Gestalt principle

A

That grouping is key to visual perception
Elements in a collection have properties that result from relationships
(negative space between features)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the gestalt factors

A

not grouped ( line of identical dots, spaced evenly)
Proximity
Similarity
Common fate (Elements that move in the same direction belong together)
Common region
Symmetry
Closure
Continuity
Parallelism

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do gestalt principles work with computer vision

A

It is difficult to know how best to apply these ideas via algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Top-down segmentation

A

pixels belong together because they are from the same object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is bottom-up segmentation

A

pixels belong together because they look ‘similar’
We define this similarity: colour, texture, gradient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we measure segmentation success

A

It is very hard to measure the success of segmentation; it depends on the application

Compare to human segmentation or to “ground truth”
- But often there is no objective definition segmentation
- Not even agreement between humans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are superpixels

A

Compact and perceptually meaningful groups of pixels in an image
The idea is to not even try to compute a “correct” segmentation
Lets be content with over-segmentation (more superpixels less big regions)
- there will be the lines that follow the correct segmentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do we segment via clustering

A

Using a pixel Intensity histogram
Often we will get curves (peaks) at different respective grey levels (objects)
We use clustering to determine the main intensities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we use sum of squared differences in clustering

A

Σclusters i Σpoints p in cluster i ||p - ci||²

We want to minimise SSD between all points p and their nearest centre ci

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the chicken and egg problem in clustering

A

Know the cluster centre → can allocate correct points to groups
know the group memberships → can get the centres of each cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is k-means clustering

A

1) Pick random k cluster centres
2) determine group members using distance to centre
3) calculate average of each group and update ci

Will always converge to some local minimum but not necessarily the global minimum of SSD||p - ci||²

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Feature Space

A

A very important factor
Depending on what we chose, we will group pixels in different ways
For example,
Intensity similarity in 1D (grayscale)
Or 3D: (r,g,b)
Or 5D: (r,g,b,x,y)
This encodes both similarity and proximity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the pros of K means

A

Simple, fast to compute
Converges to local minimum of within-cluster squared error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the cons of k means

A

Have to chose a k
Sensitive to initial centres
Sensitive to outliers
Detects spherical clusters only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a ‘hard assignment’

A

Assigning each data point to exactly one cluster/group
It is something we want to avoid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a soft assignment

A

Data points can have multiple memberships “fuzzy clustering”
Eg Gaussian mixture model using probability

18
Q

What is the Gaussian Mixture model

A

Clusters are modelled as gaussians
not just by their mean (k-means)
Gives a generative probability model of x
Expectation-Maximisation (EM) algorithm assigns data to cluster with some soft probability

19
Q

What is the Univariate Gaussian (normal) distribution

A

Describes the likelihood of observing a continuous random variable along a single dimension
Takes two parameters: μ^ and σ^² > 0

20
Q

How do we calculate μ (MLE)

A

μ^ = 1/N Σ x^(i)

21
Q

How do we calculate σ² (MLE)

A

σ^² = 1/N Σ (x^(i) - μ^)²

22
Q

What is the Multivariate normal distribution

A

Describes the joint distribution of multiple random variables along multiple dimensions

23
Q

How do we calculate μ^ (multivariate)

A

μ^ = 1/m Σ x^(j)

24
Q

How do we calculate Σ^ (multivariate)

A

Σ^ = 1/m Σ (xj - μ^)^T(xj - μ^)

T = transpose
m = number of data points
xj = j-th data point (represented as a column vector)
essentially multiplies two difference vectors and gets the square covariance matrix

25
Q

What are the three forms of covariance

A

Spherical, diagonal, full

26
Q

What is Spherical covariance

A

[σ² 0
0 σ²]
All variables have the same variance σ²
All have equal importance
visually, a sphere

27
Q

What is diagonal variance

A

[σ1² 0
0 σ2²]
Each variable has its own variance
Unique variance but no dependencies between variables
Visually: a vertical or horizontal ellipse

28
Q

What is full variance

A

[σ11² σ12²
σ21² σ22²]
Each variable has its own variance and there can be covariances between different variables (dependent)

29
Q

What is covariance

A

A measure of how two random variables change together

30
Q

What is a generative model

A

Generates new data points via a probabilistic distribution
defined by a vector of parameters θ

31
Q

What is Mixture of Gaussians (MoG)

A

A generative model that represents the data as a weighted sum of multiple Gaussian distributions

32
Q

How does MoG work

A

1)Start with parameters describing each cluster:
- mean μ, variance σ and size π
- each cluster has the gaussian probability distribution
2)Each gaussian component has probability πc
For instance, if π1=0.6 , component 1 is selected 60% of the time
3)Then we generate a datapoint x using the mean and covaraicne of that gaussian component

33
Q

What is a blob

A

Component in an image that exhibits some common characteristicent
mean μb, covariance matrices Vb, dimension d
Each blob has a probability (weight) αb

34
Q

What is the likelihood of observing x in a MoG

A

A weighted mixture of gaussian blobs
P(x|θ) = KΣ αbP(x | θb)
Here x is our soft assignment

35
Q

EM for blob

A

Find blob with paramaters θ that maximise the likelihood function
P(data | θ) = Π P(x|θ)
θ = mean and covariance

Probability of observing the whole dataset given the parameters = weighted probability of seeing x given parameters

36
Q

What are the steps of EM

A

1) Expectation step
Use the current parameter guess θ
Compute probability that point x is in blob b

2) Maximisation step
Use the computed expected values
The parameters of each blob (mean and covariance) are updated based on E-step
Update the blobs to maximise the likelihood function

3)Repeat until convergence - parameters are no longer significantly changing

37
Q

How does EM compare to k-means

A

very similar but EM is using soft assignments

38
Q

What are the applications of EM

A

Clustering
Segmentation
Model estimation
Missing data problems
finding outliers

39
Q

What are the pros of MoG

A

Probabilistic interpretation
Soft assignments between data points and clusters
Generative model, can predict novel data points
Relatively compact storage (mean and shape)

40
Q

What are the cons of MoG

A

Local minima
Initialisation affects the result - good idea to start with k means
Need to know number of components - has been figured out in the ML field
Need to choose generative model
Numerical problems are often a nuisance

41
Q

Gestalt: whole or group

A

Whole is greater than sum of its parts
Relationships among parts can yield new properties/features

42
Q

What 2 parameters does the multivariate guassian distribution take

A
  • A vector containing mean position, μ^
  • A symmetrical “positive definite” covariance matrix Σ^