Lecture 11 - Segmentation Flashcards
What is image segmentation?
Image segmentation is the process of partitioning an image into multiple segments or regions to simplify and/or change the representation of an image into something more meaningful and easier to analyze.
Describe semantic segmentation.
Semantic segmentation involves classifying each pixel in an image into a class label, assigning a label to every pixel to identify the objects and regions present.
What is a hyper-column in the context of image segmentation?
A hyper-column is a representation that combines the layer activations from each level of a convolutional network at a given pixel location, providing a rich, multi-scale feature representation for that pixel.
Explain fully convolutional networks (FCNs) in semantic segmentation.
FCNs are networks that replace fully connected layers with convolutional layers, allowing them to take input images of any size and output segmentation maps of the same size, enabling pixel-wise classification.
What is the role of a conditional random field (CRF) in segmentation?
CRFs are used to refine the segmentation by modeling the spatial dependencies and relationships between neighboring pixels, often leading to smoother and more accurate segmentation boundaries.
Describe the encoder-decoder structure in segmentation networks.
Encoder-decoder structures consist of an encoder that progressively reduces the spatial dimensions of the input to capture context and a decoder that upsamples the reduced representation to produce a dense segmentation map.
What are dense prediction models?
Dense prediction models generate output predictions at each pixel location, directly mapping the input to the output at the same resolution, used in tasks like segmentation and depth estimation.
Explain the concept of dilated convolutions.
Dilated convolutions involve inserting zeros between the filter weights, effectively expanding the receptive field without increasing the number of parameters, capturing multi-scale context.
What is transfer learning and how is it used in segmentation?
Transfer learning involves pre-training a model on a large dataset and then fine-tuning it on a smaller, task-specific dataset. In segmentation, it helps leverage learned features from image classification tasks.
What are the advantages of using the U-Net architecture?
The U-Net architecture, originally designed for biomedical image segmentation, features a symmetric encoder-decoder structure with skip connections, providing precise localization and efficient upsampling.
What is the purpose of mean intersection over union (IoU) in segmentation evaluation?
Mean IoU is a metric used to evaluate segmentation performance by measuring the overlap between the predicted segmentation and the ground truth, averaged across all classes.
Describe the concept of attention mechanisms in segmentation.
Attention mechanisms focus on relevant parts of the input image, enhancing the model’s ability to capture fine details and long-range dependencies, often improving segmentation accuracy.
Write the formula for mean intersection over union (IoU).
Provide the formula for the pixel-wise cross-entropy loss used in segmentation.
What are the key challenges in image segmentation?
Key challenges include handling diverse object scales, occlusions, varying lighting conditions, and ensuring accurate and smooth boundaries in the segmented output.