Lecture 5 - Segmentation & Recognition Flashcards

1
Q

What is image segmentation?

A

mage segmentation is the process of partitioning an image into multiple segments or groups of pixels that represent objects or regions within the image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the K-means clustering algorithm used in segmentation.

A

K-means clustering partitions an image into K clusters by randomly initializing K cluster centers, assigning each pixel to the nearest cluster, and iteratively updating the cluster centers until convergence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the concept of Mixture of Gaussians (MoG) in image segmentation.

A

MoG models the distribution of pixel intensities as a mixture of several Gaussian distributions, using the Expectation-Maximization (EM) algorithm to estimate the parameters of each Gaussian component and assign pixels to clusters probabilistically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the role of the Expectation-Maximization (EM) algorithm in probabilistic clustering?

A

The EM algorithm iteratively estimates the parameters of the Gaussian mixtures by alternating between the expectation step (calculating the probability of each pixel belonging to each Gaussian) and the maximization step (updating the Gaussian parameters).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the GraphCuts method for interactive image segmentation.

A

raphCuts involves modeling the image as a graph with pixels as nodes and edges representing the cost of assigning pixels to different segments. The minimum cut on the graph, computed using max-flow algorithms, determines the optimal segmentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Markov Random Field (MRF) and its application in image segmentation?

A

MRF is a probabilistic model that represents the spatial dependencies between pixels in an image. It is used in segmentation to enforce spatial coherence by modeling the interactions between neighboring pixels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain the K-nearest neighbor (KNN) algorithm in the context of image recognition.

A

KNN classifies a pixel or region by finding the K nearest training examples in the feature space and assigning the most common label among them. It is simple but can be computationally intensive for large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the challenges in specific object recognition?

A

Challenges include variations in viewpoint, illumination, occlusion, clutter, and intra-class variation. Effective recognition algorithms must handle these variations to accurately identify objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the concept of visual words in object recognition.

A

Visual words involve quantizing local image descriptors into a discrete vocabulary, analogous to words in a text. This representation allows efficient matching and recognition of objects by comparing histograms of visual word occurrences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Bag of Words (BoW) model in object category recognition?

A

The BoW model represents an image by the distribution of visual words it contains. It ignores spatial information but allows efficient and scalable classification by comparing histograms of word occurrences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain the sliding window approach in object detection.

A

The sliding window approach involves scanning the image with a window at different scales and positions, applying a classifier to each window to detect objects. It is computationally expensive but widely used in practice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the Random Sample Consensus (RANSAC) algorithm and its role in robust matching.

A

RANSAC iteratively selects a random subset of data points, fits a model, and tests the number of inliers that fit the model within a tolerance. It is used to robustly estimate parameters by discarding outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Write the formula for updating cluster centers in K-means clustering.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Provide the formula for the probability of a pixel belonging to a Gaussian component in MoG.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Write the formula for the cost function in GraphCuts segmentation.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does the initialization of cluster centers affect the outcome of K-means clustering?

A

The initialization can lead to different local minima, affecting the final clustering outcome. Techniques like k-means++ can improve initialization by spreading out the initial centers.

17
Q

What is the main advantage of using probabilistic models like MoG over K-means for clustering?

A

Probabilistic models provide soft assignments and a generative model, allowing for better handling of data variability and outliers.

18
Q

How does the GraphCuts method ensure spatial coherence in segmentation?

A

GraphCuts incorporate pairwise potentials that penalize discontinuities between neighboring pixels, promoting smooth and coherent segments.

19
Q

What are the limitations of the KNN algorithm in image recognition?

A

KNN is sensitive to the choice of K and the distance metric, requires storing all training data, and can be computationally expensive for large datasets.

20
Q

Explain the concept of spatial pyramid matching in visual word-based recognition.

A

Spatial pyramid matching involves dividing the image into increasingly finer grids and computing histograms of visual words at each level, capturing both local and global spatial information for better recognition.

21
Q

What are the benefits of using the Bag of Words model for object recognition?

A

The BoW model is simple, scalable, and robust to variations in object appearance, providing a compact and efficient representation for classification tasks.

22
Q

How does RANSAC improve the robustness of feature matching?

A

RANSAC iteratively fits models to random subsets of data and selects the model with the most inliers, effectively discarding outliers and ensuring robust parameter estimation.

23
Q

Describe the process of building a visual vocabulary using K-means clustering.

A

Local descriptors are extracted from training images, clustered using K-means to form visual words, and each descriptor is assigned to the nearest cluster center, creating a vocabulary for recognition.

24
Q

What is the role of the inverted file index in large-scale image retrieval?

A

The inverted file index maps visual words to the images they appear in, enabling efficient retrieval by quickly identifying images that share similar visual words with the query image.

25
Q

How does the sliding window approach handle multi-scale object detection?

A

The sliding window approach scans the image at different scales by resizing the image or the window, ensuring objects of varying sizes can be detected.

26
Q

Explain the concept of unary and pairwise potentials in MRF-based segmentation.

A

Unary potentials encode the likelihood of a pixel belonging to a particular segment based on local evidence, while pairwise potentials enforce spatial smoothness by penalizing differences in neighboring pixel labels.

27
Q

What are the advantages of using hierarchical vocabulary trees for image retrieval?

A

Hierarchical vocabulary trees allow efficient and scalable retrieval by organizing visual words into a tree structure, reducing the search space and speeding up the matching process.