Midterm Flashcards

1
Q

Define image processing

A

Image Processing is manipulating an image to improve its quality, extract information, or enable further analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define feature

A

A distinctive attribute or description used to label or differentiate objects in images

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Feature extraction involves two things. What are they?

A

Detection (finding features) and Description (quantifying features)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are invariant and covariant features?

A

Invariant features: Values remain unchanged under specific transformations (e.g., rotation, scaling)

Covariant features: Values change predictably under transformations (e.g., scaling affects area proportionally)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are local and global features?

A

Local features: Apply to individual image regions (e.g., corners, edges)

Global features: Describe entire images (e.g., colour histogram)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The purpose of preprocessing techniques is to…

A

Prepare images for further analysis by reducing noise, enhancing features, and normalizing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define boundary analysis

A

An analysis of the edges or outlines of objects to aid in object shape identification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define region analysis

A

An analysis of the areas or segments within an image to support texture and pattern recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is boundary following/tracing

A

A technique to identify the boundary of an object in a binary image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the requirements for boundary following/tracing?

A
  • Must be a binary image
  • Image padded with a border of 0’s
  • Single connected region
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are chain codes?

A

Chain codes represent the boundary of an object as a sequence of connected line segments. These segments are described using directional numbers based on connectivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the different connectivity types?

A

4-Connectivity: Segments connect pixels in horizontal and vertical directions

8-Connectivity: Segments connect pixels in horizontal, vertical, and diagonal directions (finer boundary representation than 4-C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the two types of chain codes?

A

Freeman chain codes and slope chain codes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define freeman chain codes

A

A boundary chain code that assigns a directional number (e.g., 0 for right, 1 for top-right, etc.) to each segment between consecutive boundary pixels (e.g., 0766666453321212)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a strategy that could reduce the length of a boundary chain?

A

Resample fine-grained grid to a coarser grid spacing. This also helps with reducing sensitivity to noise or segmentation errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some normalization techniques for chain codes?

A

Rotation normalization and starting point normalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is rotation normalization

A

uses the difference between consecutive directions instead of absolute directions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is starting point normalization

A

A normalization technique for chain codes that treats the chain code as circular and shifts it to start with the smallest sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define slope chain codes (SCCs)

A

A chain code for boundary analysis that uses slope changes between contiguous line segments to represent a boundary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you normalize a slope chain code?

A

Positive and zero slope changes are normalized to [0, 1), negative slope changes are normalized to (-1, 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the advantages of SCCs over Freeman codes?

A
  • Provide finer granularity by utilizing a continuous slope range (-1, 1)
  • Better representation under rotation
  • Simpler process as SCCs do not require defining a grid
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Define boundary approximation using minimum-perimeter polygons (MPP)

A

Boundary approximation using polygons to minimize the total perimeter while maintaining the shape’s integrity, provides a compact/simplified representation of object boundaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the advantages of boundary approximation using MPP?

A
  • Reduces computational complexity
  • Simplifies boundary representation for storage and analysis
  • Useful in applications like shape matching and object recognition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Define scale-invariant feature transform (SIFT)

A

SIFT extracts features that are invariant to scale, rotation, and certain changes in illumination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

SIFT is designed to detect and describe _______ features in images

A

Local

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

SIFT features are ___________

A

Invariant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Describe the first step of the SIFT algorithm: Scale Space Pyramid Construction

A

The scale space pyramid constructions step represents the image at multiple scales to detect features across varying object sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How would you construct a Scale Space Pyramid

A

Repeatedly blur (with a Gaussian filter) and
downsample the image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Each group of blurred images in a scale space pyramid is called ________

A

An octave

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Describe the second step of the SIFT algorithm: Obtain Initial Keypoints

A

Compute the difference of Gaussians (DoG) and find local extrema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How would you find the local extrema when obtaining initial keypoints in a SIFT algorithm

A

Compare each pixel’s intensity value in the 2D DoG image to the intensity values of its 8 neighbours. The pixel is marked as an extremum if its value is greater/smaller than all its neighbours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Describe the third step of the SIFT algorithm: Improve Keypoint Localization Accuracy

A

SIFT uses mathematical interpolation (Linear + Quadratic Terms of Taylor Series Expansion) to help locate the true extremum position

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are the six key steps in the SIFT algorithm?

A
  1. Construct a scale-space pyramid
  2. Obtain initial keypoints
  3. Improve keypoint localization accuracy
  4. Delete unsuitable keypoints
  5. Compute keypoint orientations
  6. Compute keypoint descriptor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How do unstable keypoints occur in a SIFT algorithm and why delete them?

A

Can occur due to:

Low Contrast + Noise: Keypoints with insignificant intensity changes are sensitive to noise

Edge Responses: Keypoints along edges are not well-localized and are less robust

Removing these keypoints ensures that SIFT retains only distinctive and stable features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is a keypoint descriptor?

A

A “unique fingerprint” for each keypoint, used to match features across images, even under changes in scale/rotation/illumination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How do you compute keypoint descriptors?

A
  1. Select neighbourhood
  2. Divide into subregions
  3. Compute gradients
  4. Create histograms
  5. Combine histograms
  6. Normalize the descriptor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is a prototype?

A

Predefined patterns or templates representing specific classes, often stored in raw or processed forms for comparison

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is prototype matching?

A

Comparing unknown patterns to stored prototypes to determine the class, similarity between unknown and known data determines classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What are some methods for prototype-based matching?

A

Minimum Distance Classifier and Template Matching

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Define Minimum Distance Classifier

A

Compares unknown patterns to the mean of each class, aligns the class with the smallest distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Define template matching

A

Uses correlation to find the best match between an unknown pattern and stored templates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What are the steps for minimum distance classification?

A
  1. Mean calculation: Compute mean vector for each class using training data
  2. Distance measurement: Measure distance between unknown pattern and each class mean
  3. Class assignment: Assign unknown pattern to class with smallest distance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What are the steps to template matching?

A
  1. Start with a template
  2. Slide template across bigger image
  3. Compare at each position
  4. Find best match
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What is a similarity score?

A

It is used in prototype matching and it determines how close a region of an image is to a predefined prototype

45
Q

How is similarity score calculated?

A

It is calculated using a formula called correlation coefficient. What it does is:

  1. Pixel-by-pixel comparison to template
  2. Normalization (makes brightnesses between template and image closer)
  3. Output score is between -1 and 1 (1: Perfect, 0: No Match, -1: Perfect inverse match (opposite))
46
Q

What is the limitation of the basic correlation formula? Is there a way to address this limitation?

A

Sensitive to intensity changes (i.e., if the image becomes brighter or darker, the correlation score will be affected). To address the limitation, use a normalized correlation formula (normalizes the correlation result to account for intensity variations in the template or the image)

47
Q

How does SIFT matching work?

A

Matching involves comparing SIFT descriptors from a known image (prototype) with descriptors from an unknown image

48
Q

SIFT descriptors are high-dimensional vectors, which means matching directly can be computationally expensive. What strategies can be implemented to improve performance?

A

Best-Bin-First Search: Quickly identifies potential matches by approximating the nearest neighbours using limited computations.

Clusters of Matches: To improve reliability, clusters of potential matches are identified using the generalized Hough transform, which groups matches that align well geometrically

49
Q

What are the steps for SIFT feature matching?

A
  1. Keypoint detection: Identify distinctive points in both images
  2. Descriptor generation: Compute a 128-dimensional vector for each keypoint
  3. Feature matching: Compare descriptors from both images and find the best match for each keypoint
  4. Filter matches: Use techniques like Lowe’s Ratio Test and Clustering to improve accuracy
50
Q

Describe Best-Bin-First Search (BBF Search)

A

Since comparing all features (brute force) is too slow, BBF Search focuses on the most likely matches first. This is done by:

  1. Organizing descriptors into bins (data structures like KD-trees)
  2. Searching in the best bin (most promising candidates)
  3. Stopping early if a good match is found

A good analogy is searching for a book by starting in the correct section instead of scanning the entire library

51
Q

Describe Clusters of Matches (Generalized Hough Transform)

A

Since individual matches can be noisy or incorrect, the Generalized Hough Transform identifies clusters of consistent matches. This is done by:

  1. Grouping matches that agree on a geometric transformation (e.g., scaling, rotation)
  2. Discard outliers that don’t align with the cluster

A good analogy is solving a jigsaw puzzle by fitting groups of pieces together

52
Q

What is a Neural Network (NN)?

A

A Neural Network (NN) is a computational system inspired by the human brain, designed to recognize patterns and solve problems

53
Q

What is the basic structure of a neural network?

A

NN is composed of interconnected units called neurons organized in layers. Key components include an input layer, hidden layers, and an output layer

54
Q

What is the difference between a biological and artificial neuron?

A

Biological Neurons:
- Process and transmit information in the brain
- Receive signals, integrate inputs, and send outputs

Artificial Neurons:
- Perform mathematical operations
- Use activation functions to decide outputs

55
Q

What is the structure of an artificial neuron?

A

Inputs: Data features or signals

Weights: Influence the strength of each input

Bias: Adds flexibility to the decision boundary

Activation Function: Determines whether a neuron should “fire” (output)

Output: Result of processing inputs

Formula:

Activation(Input * Weight + Bias)

56
Q

Describe weights in neural networks

A

Weights determine the importance of each input feature to the neuron’s output (larger = stronger influence). Weights are adjusted during training to minimize loss. Higher weights amplify corresponding inputs; lower weights diminish them. Fine-tuning weights enables the network to adapt to patterns in the data

57
Q

Describe bias in neural networks

A

Bias is a trainable parameter that allows the model to shift the activation function. Bias enables the neuron to make decisions independent of weighted inputs (helps network fit data more flexibly)

58
Q

Describe activation functions in neural networks

A

Activation functions introduce non-linearity to the network. They decide whether or not to ‘fire’ the neuron’s output

59
Q

What are the most commonly used activation functions?

A

Sigmoid: Smooth gradient, used for binary classification

ReLU: Efficient and widely used for hidden layers

Tanh: Zero-centred, scales outputs between -1 and 1

Softmax: Converts outputs to probabilities

60
Q

What is a Multi-Layer Perceptron (MLP)

A

A Multi-Layer Perceptron is a class of feed-forward neural networks consisting of multiple layers of neurons. MLPs can learn complex patterns by stacking layers. The architecture is structured in the following manner:

Input Layer: Receives the input features

Hidden Layers: Perform feature extraction through non-linear transformations

Output Layer: Provides predictions

61
Q

What is the forward propagation process?

A
  1. Input features are passed through the network
  2. Each layer applies weights, biases, and activation functions
  3. Outputs are propagated to the next layer until the final output is produced
62
Q

What is the difference between an objective function and a loss function?

A

Loss Function: Measures the error for a single data point or batch of data

Objective Function: The function to be minimized (or maximized) during training (often represents the aggregate loss over the entire dataset)

63
Q

What is backpropagation?

A
  • Process of using optimization algorithms that adjust weights and biases to minimize the loss
  • Calculates gradients of the loss function with respect to weights
64
Q

What is a gradient?

A

A gradient is a vector representing the direction and rate of a function’s steepest increase (or decrease). In neural networks, it typically refers to the partial derivatives of the loss function with respect to the model’s parameters (weights and biases). Think of it as a ‘guide’ or a ‘pointer’, a gradient just points to the best way to get to where you want to go (reduces errors in a neural network)

65
Q

What are Convolutional Neural Networks (CNNs)?

A

Specialized neural networks are primarily used for image recognition and computer vision tasks. CNNs achieve state-of-the-art performance in many tasks (e.g., image classification, object detection)

66
Q

What makes CNNs stand out from traditional machine learning?

A

Traditional machine learning methods require manual feature extraction. CNNs learn hierarchical feature representations directly from raw data (e.g., images). There is a reduced number of parameters compared to fully connected networks (MLP) (exploiting local connectivity and parameter sharing)

67
Q

What is the architecture of a CNN?

A
  1. Convolution Layer
  2. Pooling Layer
  3. Fully Connected Layer (FC)
  4. Activation Functions
68
Q

What is the convolution layer in a CNN?

A

The convolution layer performs filtering by sliding filters (kernels) over the input. It learns filters that activate when they see specific features

69
Q

What is the pooling layer in a CNN?

A

The pooling layer reduces spatial dimensions (e.g., max pooling). Helps reduce computation and control overfitting.

70
Q

What is the fully connected layer (FC) in a CNN?

A

The fully connected layer (FC) is the final layer for classification or regression.

71
Q

Describe what a filter (kernel), stride, and padding is in a convolution operation

A

Filter (kernel): A small matrix applied over the input (e.g., 3×3 or 5×5).

Stride: The step size with which the filter moves across the input.

Padding: Zero-padding preserves spatial dimensions.

72
Q

What is the formula for determining the output size (OS) in a convolution neural network?

A

OS = 1 + (W - K - 2P)/S

W: Input dimension
K: Kernel size
P: Padding
S: Stride

73
Q

The main building block of a CNN is the __________ layer

A

Convolutional

74
Q

Explain the process that occurs during a convolution operation

A
  1. Filter Sliding: Kernel moves across input data with certain stride value until it parses complete width, then moves down one row and starts at left again. This repeats until entire image is traversed
  2. Element-wise Multiplication & Summation: At each position, we multiply the overlapping input patch by the filter and sum the results
  3. Feature Map: The sum is stored in the feature map at the corresponding location
75
Q

Describe the difference between grayscale image convolution and RGB (colour) image convolution

A

Grayscale Image Convolution:
- A grayscale image has only one channel (intensity values from 0 to 255).
- Image shape is denoted as (H×𝑊×1)
- Convolution filter shape: (𝑓×𝑓×1)
- The convolution operation applies a single 2D filter over the image.
- Produces a single feature map as output.

RGB (Color) Image Convolution:
- Each pixel has three separate intensity values (3 channels: Red, Green, Blue).
- Image shape is denoted as (H×𝑊×3)
- Convolution filter shape: (𝑓×𝑓×3) – one filter “slice” per channel, then summed into a single feature map.
- Element-wise multiplication is performed independently for each channel, and the results are summed across channels.
- Produces a single feature map per filter.

76
Q

Why do we need multiple filters in a convolutional layer?

A
  • A single filter captures only one type of feature (e.g., horizontal edges).
  • A Convolutional Layer applies multiple filters to extract different features at the same time.
  • More filters = richer feature representation.

Example:
- Filter 1: Detects vertical edges.
- Filter 2: Detects horizontal edges.
- Filter 3: Detects diagonal lines

77
Q

What is depth in a convolutional layer?

A
  • The number of filters in a convolutional layer determines its depth.
  • If a layer has 64 filters, it produces 64 feature maps.
  • The output of a convolutional layer has the shape:

HxWxD

where D = number of filters (depth)

78
Q

How do CNNs learn filters?

A
  • Filters are not manually set; they are learned during training.
  • The CNN adjusts filter values using backpropagation.
  • Each filter activates strongly when it detects a matching pattern.

Over multiple layers:
- Early layers: Detect edges & textures.
- Middle layers: Detect shapes & parts.
- Deeper layers: Detect high-level objects (faces, animals, etc.).

79
Q

Why do we need pooling layers in CNNs?

A
  • Feature maps generated by Convolutional Layers are large.
  • Pooling reduces spatial size, keeping only the most important information.
  • Helps prevent overfitting by forcing CNNs to generalize.
  • Makes CNNs translation invariant (small shifts in the image don’t affect detection).
80
Q

What are the different types of pooling?

A

Max Pooling (Most Common): Takes the maximum value from each sub-region.

Average Pooling (Less Common): Takes the average value from each region, retains the overall smoothness of feature maps.

81
Q

Describe the difference between batch processing and single image processing?

A

Instead of processing one image at a time, CNNs process multiple images in parallel (batch). A batch size is the number of images processed together before updating weights. The characteristics of both options are listed below:

Batch:
- Updates weights after computing the gradient over a batch of images
- More stable gradients, efficient GPU use
- Requires more memory

Single-Image:
- Updates weights after every image
- Faster weight updates
-Unstable training, noisy updates

82
Q

True or False: Batch processing adds another dimension to an image tensor

A

True, with batch processing, an additional Batch Size (B) dimension is added: (HxWxDxB)

83
Q

What is Batch Normalization (BN)?

A

Neural networks suffer from internal covariate shift, where layer activations
change drastically, slowing training. Batch Normalization (BN) normalizes activations, reducing variance between
batches and improving stability.

84
Q

How does Batch Normalization (BN) work?

A
  • Computes the mean and variance for each batch.
  • Normalizes activations by subtracting the mean and dividing by standard deviation.
  • Applies learnable scale and shift parameters to maintain network flexibility.
85
Q

What are the benefits of Batch Normalization (BN)?

A
  • Faster convergence (reduces training time).
  • More stable training (reduces sensitivity to learning rate).
  • Reduces dependence on careful weight initialization.
  • Acts as a mild regularizer (reduces overfitting).
86
Q

What is regularization?

A

CNNs can overfit, memorizing training data instead of generalizing. Regularization techniques help improve model generalization.

87
Q

What is dropout?

A
  • During training, random neurons are deactivated with
    probability, p
  • This forces the network to learn multiple representations, improving generalization.
88
Q

How does dropout work?

A
  • In each training step, some neurons are ignored.
  • During testing, all neurons are active, but their activations are scaled by p (dropout probability)
89
Q

What are the benefits of dropout?

A
  • Prevents overfitting.
  • Helps CNNs learn redundant features.
  • Improves model robustness.
90
Q

What are Autoencoders?

A
  • Neural networks designed for unsupervised learning.
  • Learn compact representations (encoding) of input data.
  • Used to pre-train deep models when labelled data is scarce

Consists of two main parts:
- Encoder: Compresses input into a lower-dimensional representation.
- Decoder: Reconstructs the input from this compressed representation.

91
Q

Why use deep autoencoders?

A
  • Reduce dimensionality (feature compression).
  • Learn meaningful latent representations of data.
  • Useful for denoising, anomaly detection, and pretraining deep models.
92
Q

What are the use cases for autoencoders?

A
  • Image reconstruction
  • Anomaly detection
  • Data generation (using variational autoencoders (VAEs))
  • Image Segmentation (U-Net)
93
Q

What are Variational Autoencoders (VAEs)?

A

Learn probabilistic representations to generate pixel-wise segmentations. It works by the encoder network converting the input into two vectors:
- Mean (μ): Center of the latent space distribution.
- Variance (σ2): Spread of the distribution.

Then, instead of sampling directly from μ and σ, we generate z:
z = μ + σ⋅ϵ,
ϵ ∼ N(0,1)

Then, the decoder takes the sampled latent vector, z, and reconstructs the original input

94
Q

Explain how an autoencoder works

A
  • Train an autoencoder to learn unsupervised feature representations.
  • Use the encoder’s output as input features for a classifier.
95
Q

What are Generative Adversarial Networks (GANs)?

A

Generative Adversarial Networks (GANs) are a type of deep learning model used for generating new data that mimics a given dataset.

Consists of two competing neural networks:
- Generator (“Artist”): Creates fake data.
- Discriminator (“Critic”): Evaluates if data is real or fake

96
Q

What is deconvolution?

A

Deconvolution (also called transposed convolution) is used to increase the spatial resolution of feature maps in CNNs. It helps reconstruct finer details lost during convolution. Often used in image segmentation and super-resolution tasks

97
Q

How do deconvolution layers work?

A
  • Works by spreading pixel values over a larger area.
  • Deconvolution uses a learnable kernel like standard convolution but performs an inverse process.
  • Unlike upsampling, deconvolution learns weights dynamically.
98
Q

What are the components in a transposed convolution layer?

A

Stride: Spacing between output values (upsampling factor).

Kernel: Similar concept to the convolution kernel, but effectively “spread out.”

Padding & Output Shape: Calculations ensure the desired output height/width.

Learnable Parameters: Weights are learned just like in forward convolution.

99
Q

Why would you use deconvolution in segmentation?

A
  • Segmentation demands pixel-wise classification.
  • Deep networks (like CNNs) typically reduce resolution to capture context.
  • Need to “decode” feature maps back to full resolution (Transposed Conv.)
  • To classify each pixel in the original image, we need to restore or approximate its original spatial resolution.
100
Q

What is image segmentation?

A

The process of dividing an image into meaningful regions. Each pixel is assigned a label corresponding to an object/class

101
Q

What are the different types of image segmentation?

A

Semantic: Labels every pixel with a class

Instance: Identifies and separates individual objects within an image

Panoptic: Combination of semantic + instance segmentation (Recognizes both object boundaries and individual instances)

102
Q

What is U-Net?

A

Widely used model for biomedical image segmentation. Helps precisely segment small objects. U-Net concatenates feature maps from the encoder to the decoder preserving features from the earlier layers (skip connection). Consists of two parts:

  • Contracting path (Downsampling via convolutional layers)
  • Expanding path (Upsampling via deconvolution layers)
103
Q

What is transfer learning?

A

Transfer Learning is a deep learning technique where a pre-trained model is adapted for a new task. Instead of training from scratch, we reuse knowledge from existing models trained on large datasets (e.g., ImageNet). This saves computational resources and improves performance on smaller datasets.

104
Q

How does transfer learning work?

A
  1. Select a Pre-trained Model: Choose a model trained on a large dataset (e.g., VGG16, ResNet, EfficientNet)
  2. Feature Extraction or Fine-tuning:
    - Feature Extraction: Freeze convolutional layers and use them to extract useful representations.
    - Fine-tuning: Unfreeze some deeper layers and retrain them on the new dataset.
  3. Train a New Classifier: Replace the final classification layer with a new one tailored to the target task.
105
Q

What are some common image reconstruction techniques?

A

Denoising: Removes noise while preserving details

Inpainting: Fills in missing parts or damaged regions of an image

Super-Resolution: Enhances low-resolution images to high-resolution

106
Q

What are some deep learning models that can be used for image reconstruction?

A
  • CNNs
  • Autoencoders
  • GANs
107
Q

What is image augmentation?

A

It increases dataset diversity by creating artificial/modified copies of existing data. Prevents overfitting and improves model robustness to variations

108
Q

What are some image augmentation techniques?

A

Geometric Transformations: Rotation, flipping, cropping, scaling

Colour-Based Transformations: Brightness adjustment, contrast enhancement, colour jittering

Noise Addition: Gaussian noise, salt-and-pepper noise

Synthetic Data Generation: GANs and diffusion models for generating new samples

109
Q

How do image augmentation and image reconstruction complement each other?

A
  • Augmentation enhances training datasets to improve reconstruction models
  • Reconstruction techniques can be used to clean augmented images
  • Example: Super-resolution can be combined with augmentation for better data quality
  • High-quality data + Diverse training = Robust models.