Post-Midterm Flashcards

1
Q

Define discontinuity-based segmentation

A

A segmentation technique that identifies abrupt changes in intensity (where values jump significantly compared to their neighbours)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the goal of discontinuity-based segmentation

A

To separate objects from background by finding boundary pixels (edges/lines/points) that mark transitions between regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In order to detect edges, an important mathematical foundation needed is _________________

A

The derivative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A positive derivative means the intensity is __________ as x increases. Vice versa for a negative derivative.

A

Increasing

(dark –> bright when moving to the right)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does a first-order derivative (gradient) behave in an edge detection algorithm

A

Gradients highlight where large changes occur, but also indicate which side is darker or lighter (via sign)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does a second-order derivative (Laplacian) behave in an edge detection algorithm

A

Laplacians reinforce fine details and help locate edges precisely. The sign of the second derivative often reveals whether an edge is going from dark-to-light or light-to-dark

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the three different types of discontinuity

A

Points, lines, & edges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe a point discontinuity

A

Single pixel that differs sharply from neighbours (detected by the Laplacian)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe a line discontinuity

A

1-2 pixel wide structures differing in intensity from surroundings (detected by directional masks or second derivatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe an edge discontinuity

A

Transition zones (ideal: step edges; real: blurred or ramp edges)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the advantages of discontinuity segmentation

A

Can directly locate boundaries. It’s good for images where objects exhibit strong contrast against the background

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the challenges of discontinuity segmentation

A

It can be sensitive to noise (derivatives amplify noise). Images often require smoothing (pre-filtering) and careful threshold selection to avoid false edges or fragmented edges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe what a ‘point’ is in the context of point detection

A

An isolated pixel whose intensity differs significantly from its immediate neighbours. It typically appears as a bright or dark ‘spot’ in a relatively uniform background

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the steps involved to implement a point detection algorithm

A

Step 1: Apply second-order derivative (Laplacian) filter

Step 2: Take the absolute value of the response

Step 3: Threshold the absolute response

Step 4: Label isolated points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe how to accomplish the first step, ‘Apply Laplacian Filter’, when implementing a point detection algorithm

A

Convolve the kernel below with an image to obtain a filter response

Second-Order 3x3 Laplacian kernel:

0 1 0
1 -4 1
0 1 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe how to accomplish the second step, ‘Take the absolute value of the response’, when implementing a point detection algorithm following the application of the Laplacian filter

A

After the first step, the Laplacian response can be positive or negative. By taking the absolute value of the response, we get a magnitude that indicates how large the change is, regardless of sign

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Describe how to accomplish the third step, ‘Threshold the absolute response’, when implementing a point detection algorithm after taking the absolute value of the Laplacian response

A

Test or take a percentage of the maximum magnitude such that only prominent ‘spikes’ get labelled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Describe how to accomplish the fourth and final step, ‘Label isolated points’, when implementing a point detection algorithm after thresholding the absolute response

A

If the magnitude of Z(x,y) is greater than the threshold, declare (x,y) an isolated point. Store this point as a 1 (or ‘white’) in an output binary image, while others are stored as 0 (or ‘black’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Describe a ‘line’ in the context of line detection

A

A set of connected pixels with similar intensity, often just 1-2 pixels in thickness, differing in intensity from its background

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Briefly describe the process to apply line detection to an image

A

Convolve the image with a second-order derivative filter or with specialized directional filters, as shown below. After convolution, threshold the filter response to isolate line pixels.

Vertical kernel:
-1 2 -1
-1 2 -1
-1 2 -1

Horizontal kernel:
-1 -1 -1
2 2 2
-1 -1 -1

45-degree kernel:
2 -1 -1
-1 2 -1
-1 -1 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Define ‘edge’ in the context of edge detection

A

A boundary between two distinct regions of intensity or texture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the different types of edges?

A

Step edge - Sudden transition in intensity (ideal)

Ramp edge - Gradual transition (common)

Roof edge - Transition to one intensity from another, then quickly back to the original (typical in thin lines or object ridges)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

True or False: Image noise/blur do not cause step edges to turn into ramp edges

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is ‘clustering’ in clustering segmentation

A

The clustering approach in segmentation involves grouping pixels based on intensity, colour, or feature similarity without requiring labelled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is ‘K-Means’ clustering?

A

An approach to clustering segmentation that involves the following steps:

  1. Choose k (# of clusters)
  2. Initialize cluster centres (random or heuristics)
  3. Assign each pixel to nearest cluster centre (Euclidean distance in intensity space)
  4. Update cluster centres as mean of assigned pixels
  5. Iterate until convergence

The output of this process will be an image with each pixel labelled with a cluster index (segmented image with k segments)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is ‘Fuzzy C-Means’ clustering?

A

An approach to clustering segmentation where each pixel has partial memberships to clusters (helps with ambiguous boundaries/smooth transitions) and is most useful when cluster boundaries are not sharp. This method builds clusters step-by-step (agglomerative) or splits them (divisive) and is more computationally intensive than K-Means clustering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is watershed segmentation

A

A gradient-based segmentation method that interprets the gradient magnitude as a topographic surface (low intensities = valleys; high intensities = ridges)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are active contours (or snakes)

A

A gradient-based segmentation method that ‘evolves’ a curve such that it locks onto region boundaries. The curve evolves under internal smoothness constraints and external ‘image forces’ derived from the gradient. This means that the contours are minimized through a combination of internal energy (smoothness) and external energy (pulls curve toward edges)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are some common metrics used to evaluate the effectiveness of a segmentation algorithm?

A

of correct pixels / # of total pixels

Pixel Accuracy:
qty. of correct pixels / qty. of total pixels

IoU (Intersection over Union):
intersection(A,B) / union(A,B)

Dice Coefficient:
2 x [ intersection(A,B) / ( |A| + |B| ) ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are common challenges involved with image segmentation?

A
  • Noise sensitivity (derivatives amplify noise)
  • Parameter tuning (thresholds, cluster sizes, kernel sizes)
  • Complex scenes (overlapping objects, illumination changes, low contrast)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Define structuring element (SE)

A

A small, predefined shape (or set of pixels) used in morphological image processing. It acts like a ‘probe’ that scans over an image to analyze or modify its shapes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

When processing an image using mathematical morphology, a _____________ is moved across the image to modify object shapes based on specific rules

A

Structuring element (SE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are the most common morphological operations

A
  • Erosion (remove pixels from objects)
  • Dilation (expands pixels from objects)
  • Opening & Closing (combinations of erosion and dilation)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Define reflection and translation in the context of image morphology

A

Reflection: Flipping an SE by 180 degrees

Translation: Moving an SE across an image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Define the erosion morphological operation

A

This operation shrinks foreground objects in an image. This is done by translating an SE over all possible positions in an image, and marking the origin of the SE as a foreground pixel (1) if the SE fits entirely inside the image at each position. Otherwise, all other pixels are marked as a background pixel (0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are the effects of the erosion morphological operation?

A
  • Thins/Shrinks objects
  • Removes small noise
  • Separates connected components in an image
  • Shapes objects based on structuring element (e.g. elongated SE can reduce objects to a line)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Define the dilation morphological operation

A

This operation expands/thickens foreground objects in an image. This is done by translating an SE over all possible positions in an image, and marking the origin of the SE as a foreground pixel (1) if the SE overlaps at least one foreground pixel at each position in the image. Otherwise, all other pixels are marked as a background pixel (0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What are the effects of the dilation morphological operation?

A
  • Grows objects
  • Small gaps/holes filled
  • Shapes expand based on SE size and shape
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Describe the concept of ‘morphological duality’ in the context of erosion and dilation

A

Duality describes the relationship between dilation and erosion through complementation (one operation can be derived from the other by working with the image’s background instead of the foreground)

40
Q

What is the erosion-dilation duality

A

The complement of the erosion of A by B is equal to the dilation of the complement of A using the reflected structuring element, Br

41
Q

What is the dilation-erosion duality

A

The complement of the dilation of A by B is equal to the erosion of the complement of A using the reflected structuring element, Br

42
Q

Define the opening morphological operation

A

This operation removes small objects or noise while preserving the general shape of larger objects. This is done by first eroding A by B (shrinking it), then dilating it back using B, partially restoring the main structure.

43
Q

Define the closing morphological operation

A

This operation fills small gaps or holes, smooths object contours, and fuses narrow breaks. This is done by first dilating A by B (expanding it), then eroding it back using B, smoothing object boundaries and filling in small gaps.

44
Q

Define ‘Hit-Or-Miss’ (HMT) Transform

A

A morphological tool used for shape detection in binary images. It relies on two SEs rather than one (one for foreground, other for background)

45
Q

Why would one use a hit-or-miss transform

A

HMT allows detection of small features (e.g. corners, endpoints) precisely by combining erosion with a specially designed pair of SEs

46
Q

Define boundary extraction

A

A basic morphological algorithm that isolates the edges of a foreground object using erosion and set difference

47
Q

What is the process involved in a boundary extraction algorithm?

A

Erode the object within the image using a structuring element. Then, subtract the eroded image from the original image to leave only the boundary pixels

48
Q

Define hole filling algorithm

A

A basic morphological algorithm that fills background regions enclosed by a connected foreground boundary using dilation, complementation, and intersection

49
Q

What is the process involved in a hole filling algorithm?

A

Create an array of 0’s the same size as the image and set 1’s at known hole locations. Then, apply dilation to to the newly created array using a symmetric SE. Intersect the result with the complement of the image to limit expansion inside the hole, and repeat this process until no further changes occur. The union of this array and the original image fills all holes while preserving object boundaries

50
Q

Define connected component extraction

A

A basic morphological algorithm that identifies and isolates groups of connected foreground pixels in a binary image

51
Q

What is the process involved in a connected component extraction algorithm?

A

Create an array of 0’s the same size as the image and set 1’s at known points within each connected component. Apply dilation to this array using a symmetric SE. Intersect this array with the original image to restrict growth, and repeat this process until no further changes occur. The newly created array contains all connected components of the original image

52
Q

Define convex hull

A

The convex hull of a set A is the smallest convex set that fully contains A.

This basically means that many of the same type of morphological operation is performed on an image, but each with different SEs. The intersection of all these outputs is the convex hull

53
Q

Define thinning

A

A morphological operation that reduces a binary object to a skeleton-like shape while preserving its connectivity. It is defined using HMT and an iterative process involving SEs (see lecture slides for SE structures)

54
Q

Define thickening

A

A morphological operation that expands the foreground structure in a controlled way. It is the ‘morphological dual’ of thinning

55
Q

Define skeleton

A

A thin, central representation of a set, preserving its topology and shape while reducing redundancy

56
Q

Define morphological reconstruction

A

A powerful transformation that uses two images (marker and mask) and an SE to extract or restore objects

57
Q

What is a marker image and what is a mask image?

A

A marker image defines starting points for morphological reconstruction. A mask image restricts growth (conditions the reconstruction)

58
Q

Define geodesic dilation

A

A morphological operation that expands the marker image while limiting growth by using a mask

59
Q

Define geodesic erosion

A

A morphological operation that shrinks the marker image while staying greater than or equal to a mask

60
Q

What is ‘Reconstruction by Dilation’

A

A type of morphological reconstruction that utilizes geodesic dilation iterated until stability

61
Q

What is ‘Reconstruction by Erosion’

A

A type of morphological reconstruction that utilizes geodesic erosion iterated until stability

62
Q

Describe how dilation would work in a grayscale image rather than a binary image

A

The origin pixel (centre of SE) is replaced with the maximum value in the SE neighbourhood. This expands bright regions and enhances peaks

63
Q

Describe how erosion would work in a grayscale image rather than a binary image

A

The origin pixel (centre of SE) is replaced with the minimum value in the SE neighbourhood. This shrinks bright regions and enhances valleys

64
Q

Define Large Language Models (LLMs)

A

Neural networks trained on vast amounts of text data that can predict/generate human-like text, often using the transformer architecture

65
Q

What are the three different variations of transformer architecture

A
  • Encoder-Decoder Structure
  • Encoder-Only Structure
  • Decoder-Only Structure
66
Q

Describe an encoder-decoder transformer architecture

A

The encoder processes input (e.g. text tokens) and produces hidden representations, decoder generates output (e.g. translated text or next token) by attending to encoder outputs and previously generated tokens. It’s best applied to text summarization or question answering applications

67
Q

Describe an encoder-only transformer architecture

A

This architecture focuses only on understanding or embedding text and not typically used for text generation. It’s best applied to sentiment analysis and text classification applications

68
Q

Describe a decoder-only transformer architecture

A

This architecture generates text by using past context in a single transformer block without an encoder (ChatGPT does this). It operates in an autoregressive manner (generates one token at a time while using previous outputs as context). It’s best applied to story/article generation, chatbots, and code generation applications

69
Q

Define scaled dot-product attention

A

A core component of Transformer models that calculates ‘attention’ weights by taking the dot product of query and key vectors, scaling the result, and applying a softmax function to obtain normalized weights, which are then used to weight the value vectors.

70
Q

What are the three inputs that scaled dot-product attention takes

A

Queries (Q), Keys (K), and Values (V)

71
Q

Describe the Key (K) input to scaled dot-product attention

A

An input that represents the “memory” or reference that other tokens compare themselves to. Basically, keys define what each token has to offer

72
Q

Describe the Query (Q) input to scaled dot-product attention

A

An input that indicates what the current token is looking for in the other tokens. We compare this query against all the keys to figure out how relevant each key is to our query

73
Q

Describe the Value (V) input to scaled dot-product attention

A

Holds the actual content that can be retrieved if the token is deemed relevant (based on the query-key comparison). Once we determine how much attention (weight) to assign to each token (via the query-key matching), we use the corresponding values to form the output representation

74
Q

Define multi-head attention

A

A key mechanism of transformers that allows the model to look at different parts of the sequence. It splits the hidden representations into multiple “heads” to attend to different positions or features in parallel. This allows the model to capture more nuanced relationships within text

75
Q

Define vision transformers (ViT)

A

The same as a LLM transformer, but images are treated as tokens instead of words. Images are divided into fixed-size patches, each acting as a ‘token’

76
Q

Define Contrastive Language-Image Pre-training (CLIP)

A

A model developed by OpenAI that learns to connect images and text by training on a massive dataset of image-text pairs. It creates a shared latent space where visual and textual concepts are aligned

77
Q

What are the architecture components of CLIP

A

Image encoder (converts input image into feature vector) and text encoder (converts text description into a corresponding feature vector)

78
Q

What are the two components that allow the CLIP training mechanism to function

A

Contrastive loss and joint representation space

79
Q

Describe contrastive loss

A

A component in the CLIP training mechanism where the training objective is to bring the representations of matching image-text pairs closer while pushing apart non-matching pairs. This is achieved using a loss function that considers all pairs in a batch

80
Q

Describe joint representation space

A

A component in the CLIP training mechanism where both encoders are optimized so that semantically related images and texts have similar representations

81
Q

Define diffusion models

A

A class of generative models that create images by iteratively denoising random noise

82
Q

Fill in the Blank: The diffusion process has two components, _________ and ________

A

Forward, reverse

83
Q

Describe the forward process in diffusion

A

The forward process is the actual ‘diffusion’ portion. Gaussian noise is added gradually over many time steps such that the image slowly degrades until it becomes nearly pure noise. The purpose of this process is to define a known probabilistic path from data to noise, which the model learns to reverse

84
Q

Describe the backward process in diffusion

A

The backward process is the ‘denoising’ portion. Starting from pure noise, the model performs a series of denoising steps by predicting and subtracting the noise component. This progressively refines the image and, once complete, the model reconstructs a high-quality image

85
Q

What is the data space of traditional diffusion models

A

Traditional diffusion models operate directly on pixel space

86
Q

Describe the architecture of traditional diffusion models

A

Consists of two components: Forward (diffusion) and Reverse (denoising) Process

87
Q

True or False: Traditional diffusion models have low computational cost due to their data space

A

False, computational cost is high due to the high dimensionality of pixel space

88
Q

True or False: Traditional diffusion models require many denoising steps for high-quality image reconstruction

89
Q

What is the data space of latent diffusion models

A

The data space of latent diffusion models is a compressed, lower-dimensional latent space rather than directly on pixels

90
Q

Describe the architecture of latent diffusion models

A

Consists of an encoder, decoder, and a diffusion process. The encoder is a pre-trained autoencoder or VAE that converts images into a latent representation. The decoder takes the latent representation and transforms it back to a high resolution image. Among this process, a U-Net or similar network applies the diffusion process within the latent space (prior to decoding)

91
Q

True or False: Latent diffusion is more efficient in terms of computation and memory than traditional diffusion

92
Q

What is the data space of stable diffusion models

A

The data space of a stable diffusion model incorporates a text encoder (e.g., from CLIP) to guide the image generation process based on textual prompts

93
Q

Describe the architecture of stable diffusion models

A

Consists of an autoencoder, U-Net diffusion model, and cross-attention. The autoencoder encodes images into latent space and decodes them back, as in latent diffusion. The U-Net diffusion model applies the diffusion process in latent space, but enhanced with cross-attention layers to integrate text embeddings. Cross-attention merges the text conditioning with the image latent features to steer the generation towards the desired content

94
Q

True or False: Latent diffusion models can produce high-quality, detailed images guided by natural language inputs

A

False, a stable diffusion model does that

95
Q

When integrating vision and language models, what are the three fusion strategies for accomplishing this?

A

Early Fusion: Combining modalities early in the network (less common due to different data structures)

Late Fusion: Independent processing followed by alignment in a joint embedding space

Cross-Attention: Integrates features from both modalities during processing for deeper interaction