chatGPT questions exam2 Flashcards
What does Stochastic Gradient Descent (SGD) optimize in machine learning models?
SGD is used to minimize the loss function of a model.
How does SGD differ from traditional gradient descent?
Unlike traditional gradient descent that uses the entire dataset for updates, SGD updates model parameters using only a single sample or a small batch of samples.
What are the key benefits and drawbacks of using SGD?
SGD is more computationally efficient for large datasets and can help escape local minima, but it may lead to slower and less stable convergence.
What does the FROC curve evaluate in medical image analysis?
the FROC curve evaluates detection and localization performance, plotting sensitivity against the average number of false positives per image.
Why can accuracy be a misleading performance metric?
Accuracy can be misleading in cases of class imbalance, where it doesn’t reflect the model’s performance on the minority class
What was the major limitation of the original R-CNN in object detection?
The original R-CNN was slow due to its reliance on selective search for generating region proposals.
How did Fast R-CNN improve over R-CNN?
Fast R-CNN improved efficiency by sharing computations across region proposals.
What innovation did Faster R-CNN introduce to object detection?
Faster R-CNN introduced a Region Proposal Network (RPN), allowing for end-to-end training and faster proposal generation.
How does setting a high threshold for predictions affect sensitivity and specificity?
A high threshold generally increases specificity but may decrease sensitivity by rejecting true positives with lower confidence.
What approach do convolutional networks use for segmentation tasks?
They utilize architectures with downsampling and upsampling layers, and sometimes strided or dilated convolutions, to produce a pixel-wise segmentation map.
How is the number of trainable parameters in a convolutional layer calculated?
The formula is (filter height × filter width × input channels + 1) × number of filters, where “+1” accounts for the bias term.
Why is the learning rate considered a critical hyperparameter in neural network training?
t influences the training dynamics, where too high a rate may cause divergence and too low a rate results in slow convergence.
What does dropout do in neural networks?
Dropout prevents overfitting by randomly dropping units and their connections during training, simulating training multiple networks in parallel.
What are the advantages of ReLU over sigmoid functions?
ReLU mitigates the vanishing gradient problem, accelerates convergence, and maintains gradient flow for positive inputs.
Why is feature normalization important before training a machine learning model?
it scales features to have zero mean and unit variance, ensuring consistent scaling and preventing information leakage from the test set.
How do residual networks (ResNets) facilitate the training of deep models?
resNets use skip connections to learn residual mappings, alleviating the vanishing gradient problem and enabling deeper architectures.