Lecture 11 - Segmentation Flashcards
What is image segmentation?
Image segmentation is the process of partitioning an image into multiple segments or regions to simplify and/or change the representation of an image into something more meaningful and easier to analyze.
Describe semantic segmentation.
Semantic segmentation involves classifying each pixel in an image into a class label, assigning a label to every pixel to identify the objects and regions present.
What is a hyper-column in the context of image segmentation?
A hyper-column is a representation that combines the layer activations from each level of a convolutional network at a given pixel location, providing a rich, multi-scale feature representation for that pixel.
Explain fully convolutional networks (FCNs) in semantic segmentation.
FCNs are networks that replace fully connected layers with convolutional layers, allowing them to take input images of any size and output segmentation maps of the same size, enabling pixel-wise classification.
What is the role of a conditional random field (CRF) in segmentation?
CRFs are used to refine the segmentation by modeling the spatial dependencies and relationships between neighboring pixels, often leading to smoother and more accurate segmentation boundaries.
Describe the encoder-decoder structure in segmentation networks.
Encoder-decoder structures consist of an encoder that progressively reduces the spatial dimensions of the input to capture context and a decoder that upsamples the reduced representation to produce a dense segmentation map.
What are dense prediction models?
Dense prediction models generate output predictions at each pixel location, directly mapping the input to the output at the same resolution, used in tasks like segmentation and depth estimation.
Explain the concept of dilated convolutions.
Dilated convolutions involve inserting zeros between the filter weights, effectively expanding the receptive field without increasing the number of parameters, capturing multi-scale context.
What is transfer learning and how is it used in segmentation?
Transfer learning involves pre-training a model on a large dataset and then fine-tuning it on a smaller, task-specific dataset. In segmentation, it helps leverage learned features from image classification tasks.
What are the advantages of using the U-Net architecture?
The U-Net architecture, originally designed for biomedical image segmentation, features a symmetric encoder-decoder structure with skip connections, providing precise localization and efficient upsampling.
What is the purpose of mean intersection over union (IoU) in segmentation evaluation?
Mean IoU is a metric used to evaluate segmentation performance by measuring the overlap between the predicted segmentation and the ground truth, averaged across all classes.
Describe the concept of attention mechanisms in segmentation.
Attention mechanisms focus on relevant parts of the input image, enhancing the model’s ability to capture fine details and long-range dependencies, often improving segmentation accuracy.
Write the formula for mean intersection over union (IoU).
Provide the formula for the pixel-wise cross-entropy loss used in segmentation.
What are the key challenges in image segmentation?
Key challenges include handling diverse object scales, occlusions, varying lighting conditions, and ensuring accurate and smooth boundaries in the segmented output.
How do fully convolutional networks (FCNs) differ from traditional convolutional networks?
FCNs replace fully connected layers with convolutional layers, allowing them to output segmentation maps that match the input image size, suitable for pixel-wise classification.
Explain how transfer learning benefits segmentation tasks.
Transfer learning leverages features learned from large datasets in related tasks, reducing the need for extensive labeled data and training time, and often improving segmentation performance.
What is the role of bilinear interpolation in upsampling within segmentation networks?
Bilinear interpolation is used to increase the spatial resolution of feature maps by linearly interpolating between pixel values, providing smoother upsampling compared to nearest-neighbor methods.
Describe the encoder-decoder structure of the U-Net architecture.
The U-Net architecture consists of an encoder that downsamples the input image to capture context, followed by a decoder that upsamples the features to produce a high-resolution segmentation map, with skip connections between corresponding layers.
What is the significance of using dilated convolutions in segmentation networks?
Dilated convolutions increase the receptive field without adding extra parameters, allowing the network to capture multi-scale context and improve segmentation accuracy, especially for large objects.
How does a conditional random field (CRF) refine segmentation results?
CRFs model the dependencies between neighboring pixels, refining segmentation by enforcing spatial coherence and producing smoother and more accurate boundaries.
What are hyper-columns and how are they used in segmentation?
Hyper-columns are multi-scale representations that combine activations from different layers of a network at each pixel, providing rich features for precise segmentation.
Explain the concept of mean field approximation in the context of CRFs.
Mean field approximation is a technique for approximating the inference in CRFs, often used to make the computation tractable by iteratively updating the marginal distributions of the variables.
What are the advantages of using pyramid scene parsing networks (PSPNet) in segmentation?
PSPNet captures global context information by pooling features at multiple scales, improving the segmentation of objects at different scales and enhancing overall accuracy.