Segment/Depth Anything Flashcards

Question 1

Q

How the focal loss helps in an imbalanced classification

Answer

A

Focal loss is takes Cross Entropy loss and adds a modulating factor that helps to bring the loss further down when an example is classified well. That makes it focus more on hard examples.

Question 2

Q

Why is focal loss good for segmentation

Answer

A

In segmentation we have class a heavy class imbalance because most of the pixels are negative and only a few are positive for the specific class we want.

Question 3

Q

What is the task proposal for segmenting anything

Answer

A

Return a valid segmentation mask given any prompt

Question 4

Q

What can be ambiguous in a point prompt for a segmentation

Answer

A

A point can be the whole object, a part of the object or the sub-part of the object.

Question 5

Q

In the training of the model, how did the vit evolved?

Answer

A

They started with vit-base and moved to vit-huge in the 2nd stage.

Question 6

Q

Write the equation for focal loss & cross-entropy loss

Answer

A

FL=((1-p_t)^gamma)*CE
CE = -log(p_t)
p_t = p if y==1 else 1-p

Question 7

Q

List depth anythin main contributions

Answer

A

Train on both labeled and unlabeled data to create a strong small student network.
Unlabeled training with strong perturbations: gaussian blur, color jitter & CutMix.
Semantic priors loss for encoder comparison.

Question 8

Q

Explain the idea behind semantic prior method

Answer

A

Create a loss from the student model encoder to a frozen big encoder to enforce producing similar feature extraction.

Question 9

Q

Explain in words the equation of the semantic prior method

Answer

A

take a cosine similarity of each feature value and divide it by the amount of features (take 1 minus for loss) (depth anything)

Question 10

Q

How to combat the problem that semantic encoders encode similar features to pixels in the same object part or different object?

Answer

A

Create an alpha threshold. and when the cosine would be greater than alpha, then the pixel would not be considered in the loss. That would help the depth encoder learn features specific for depth if needed, if not it can take the dinov2 features.

Question 11

Q

List the strong perturbations they used

Answer

A

colorJitter
GaussianBlur
CutMix

Question 12

Q

Explain the steps to create the CutMix loss for teacher student (like in depth anything).

Answer

A

We take 2 unlabeled images
We take the teacher model and predict on both images.
We initialise a mask
We use the student model to predict on the combined images with the mask.
We take the distance of the prediction between: Student network multiplied by mask. Teacher network multiplied by the mask.
The distance is the same depth loss distance.
Do the same for outside the mask.
Combine the results

Question 13

Q

How do you use unlabeled data in order to train a better small Network

Answer

A

You take the unlabeled data with a strong network. use cutmix and train the student network more.

Segment/Depth Anything Flashcards

(13 cards)