chatGPT questions exam2 Flashcards
What does Stochastic Gradient Descent (SGD) optimize in machine learning models?
SGD is used to minimize the loss function of a model.
How does SGD differ from traditional gradient descent?
Unlike traditional gradient descent that uses the entire dataset for updates, SGD updates model parameters using only a single sample or a small batch of samples.
What are the key benefits and drawbacks of using SGD?
SGD is more computationally efficient for large datasets and can help escape local minima, but it may lead to slower and less stable convergence.
What does the FROC curve evaluate in medical image analysis?
the FROC curve evaluates detection and localization performance, plotting sensitivity against the average number of false positives per image.
Why can accuracy be a misleading performance metric?
Accuracy can be misleading in cases of class imbalance, where it doesn’t reflect the model’s performance on the minority class
What was the major limitation of the original R-CNN in object detection?
The original R-CNN was slow due to its reliance on selective search for generating region proposals.
How did Fast R-CNN improve over R-CNN?
Fast R-CNN improved efficiency by sharing computations across region proposals.
What innovation did Faster R-CNN introduce to object detection?
Faster R-CNN introduced a Region Proposal Network (RPN), allowing for end-to-end training and faster proposal generation.
How does setting a high threshold for predictions affect sensitivity and specificity?
A high threshold generally increases specificity but may decrease sensitivity by rejecting true positives with lower confidence.
What approach do convolutional networks use for segmentation tasks?
They utilize architectures with downsampling and upsampling layers, and sometimes strided or dilated convolutions, to produce a pixel-wise segmentation map.
How is the number of trainable parameters in a convolutional layer calculated?
The formula is (filter height × filter width × input channels + 1) × number of filters, where “+1” accounts for the bias term.
Why is the learning rate considered a critical hyperparameter in neural network training?
t influences the training dynamics, where too high a rate may cause divergence and too low a rate results in slow convergence.
What does dropout do in neural networks?
Dropout prevents overfitting by randomly dropping units and their connections during training, simulating training multiple networks in parallel.
What are the advantages of ReLU over sigmoid functions?
ReLU mitigates the vanishing gradient problem, accelerates convergence, and maintains gradient flow for positive inputs.
Why is feature normalization important before training a machine learning model?
it scales features to have zero mean and unit variance, ensuring consistent scaling and preventing information leakage from the test set.
How do residual networks (ResNets) facilitate the training of deep models?
resNets use skip connections to learn residual mappings, alleviating the vanishing gradient problem and enabling deeper architectures.
What does the receptive field refer to in convolutional networks?
It refers to the size of the input area that influences the network’s output, determined by the cumulative effect of convolution and pooling operations.
How does the softmax function work in neural networks?
It converts raw output scores into probabilities by taking the exponential of each output and normalizing these values by the sum of all exponentials.
What does a false negative (FN) indicate in detection tasks?
A FN occurs when an object is present but not detected by the classifier, which is critical in applications where missing detections can have serious consequences.
What is a dilated (atrous) convolution and its purpose?
It increases the receptive field without increasing the number of weights by inserting spaces between kernel elements, allowing for broader spatial aggregation.
How does U-Net architecture specialize for medical image segmentation?
U-Net uses a symmetric architecture with downsampling and upsampling paths, enabling precise localization and context integration without needing a pre-defined weight map.
How is specificity calculated in a model’s performance evaluation?
Specificity is calculated as the number of true negatives divided by the sum of true negatives and false positives.
What determines the output size of a valid convolution operation?
The output size is determined by (W − F + 1) × (H − F + 1), where W and H are the width and height of the image, and F is the filter size.
What makes the YOLO object detection framework unique?
YOLO processes the entire image in a single evaluation and makes predictions for each grid cell, combining bounding box predictions and class probabilities.
How does backpropagation work in training neural networks?
It computes gradients of the loss with respect to network parameters by applying the chain rule, allowing for efficient parameter updates.
What are histogram-based features in image processing?
These features, like mean, variance, skewness, and kurtosis, describe the distribution of pixel intensities and are invariant to/do not consider spatial relationship and correlation between pixels (identical histograms can belong to different textures)
how does the shift-and-stitch algorithm benefit fully convolutional networks?
It enables the generation of fine-grained outputs by reconstructing the original resolution through multiple shifted input passes.
What improvement does Fast R-CNN offer over the original R-CNN?
Fast R-CNN processes the entire image at once and uses RoI pooling to extract features, improving efficiency and speed
How does histogram matching adjust an image’s brightness and contrast?
It modifies an image so that its histogram matches that of a reference image, using cumulative distribution functions for mapping pixel values.
What are the implications of not using pooling layers in CNNs?
Omitting pooling layers requires convolutional layers with strides for downsampling, potentially leading to higher memory and computational costs.
How does backpropagation facilitate learning in CNNs?
It calculates gradients (using chain rule) for all network parameters by propagating errors backward through the network, enabling parameter updates for learning.
What are the steps in mini-batch gradient descent training?
The steps are selecting a mini-batch, performing a forward pass, computing loss, backpropagating to compute gradients, and updating weights.
How does the choice of pooling size and stride affect max-pooling operations?
Incompatible pooling sizes and strides may lead to suboptimal coverage and downsampling uniformity in the feature map.
What strategy is used in transfer learning with small datasets?
Freeze most of the pre-trained network’s parameters and fine-tune the last few layers to adapt the features to the new task.
How does dilated convolution affect the receptive field in CNNs?
dilated convolution expands the receptive field without increasing the number of weights, allowing for broader spatial information aggregation.
What is the purpose of the softmax function in neural networks?
it converts logits into probabilities, facilitating multi-class classification by ensuring output values sum to 1.
What does the term “dying ReLU” problem refer to?
it refers to the issue where neurons stop learning due to always outputting zero, which can stem from improper initialization or gradient descent issues. - this can be solved with leakyrelu
How is the number of trainable parameters in a dense layer calculated?
It is calculated as (number of inputs + 1) × number of outputs, accounting for each input connection and bias term.
What does zero padding do in convolutional layers?
Zero padding allows control over output dimensions, maintaining spatial sizes or adjusting them for specific convolutional effects.
How is the “best model” identified during neural network training?
The best model is often the one with the lowest validation loss, indicating effective learning and generalization capabilities without overfitting.