Segment/Depth Anything Flashcards
How the focal loss helps in an imbalanced classification
Focal loss is takes Cross Entropy loss and adds a modulating factor that helps to bring the loss further down when an example is classified well. That makes it focus more on hard examples.
Why is focal loss good for segmentation
In segmentation we have class a heavy class imbalance because most of the pixels are negative and only a few are positive for the specific class we want.
What is the task proposal for segmenting anything
Return a valid segmentation mask given any prompt
What can be ambiguous in a point prompt for a segmentation
A point can be the whole object, a part of the object or the sub-part of the object.
In the training of the model, how did the vit evolved?
They started with vit-base and moved to vit-huge in the 2nd stage.
Write the equation for focal loss & cross-entropy loss
FL=((1-p_t)^gamma)*CE
CE = -log(p_t)
p_t = p if y==1 else 1-p
List depth anythin main contributions
Train on both labeled and unlabeled data to create a strong small student network.
Unlabeled training with strong perturbations: gaussian blur, color jitter & CutMix.
Semantic priors loss for encoder comparison.
Explain the idea behind semantic prior method
Create a loss from the student model encoder to a frozen big encoder to enforce producing similar feature extraction.
Explain in words the equation of the semantic prior method
take a cosine similarity of each feature value and divide it by the amount of features (take 1 minus for loss) (depth anything)
How to combat the problem that semantic encoders encode similar features to pixels in the same object part or different object?
Create an alpha threshold. and when the cosine would be greater than alpha, then the pixel would not be considered in the loss. That would help the depth encoder learn features specific for depth if needed, if not it can take the dinov2 features.
List the strong perturbations they used
colorJitter
GaussianBlur
CutMix
Explain the steps to create the CutMix loss for teacher student (like in depth anything).
We take 2 unlabeled images
We take the teacher model and predict on both images.
We initialise a mask
We use the student model to predict on the combined images with the mask.
We take the distance of the prediction between: Student network multiplied by mask. Teacher network multiplied by the mask.
The distance is the same depth loss distance.
Do the same for outside the mask.
Combine the results
How do you use unlabeled data in order to train a better small Network
You take the unlabeled data with a strong network. use cutmix and train the student network more.