Vision architectures Flashcards

1
Q

What is the purpose of SSD (Single Shot Multibox Detector) in vision tasks?

A

SSD is designed for real-time object detection, combining object localization and classification in a single forward pass.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is R-CNN, and how does it work?

A

R-CNN (Region-Based Convolutional Neural Network) generates region proposals and classifies them using a CNN for object detection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does YOLO (You Only Look Once) specialize in?

A

YOLO is a real-time object detection model that predicts bounding boxes and class probabilities directly from images in a single pass.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are Siamese Networks used for in vision tasks?

A

Siamese Networks are used for tasks like image similarity and verification by comparing embeddings of two input images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is EfficientNet, and why is it popular?

A

EfficientNet is a family of models that scales depth, width, and resolution efficiently to achieve high accuracy with fewer parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Xception, and how does it improve upon traditional CNNs?

A

Xception uses depthwise separable convolutions to reduce computational cost while maintaining high performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is MobileNet designed for?

A

MobileNet is optimized for mobile and embedded vision applications, using depthwise separable convolutions for efficiency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is ViT (Vision Transformer), and what makes it unique?

A

ViT applies the transformer architecture to image patches, achieving high performance without convolutional layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is SegFormer used for in vision tasks?

A

SegFormer is a transformer-based model for semantic segmentation, combining global attention and lightweight architecture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is DeepLabv3, and what tasks is it used for?

A

DeepLabv3 is designed for semantic segmentation, utilizing atrous convolutions to capture multi-scale contextual information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Swin Transformer, and what is its innovation?

A

The Swin Transformer uses a hierarchical structure with shifted windows for efficient vision tasks, enabling scalability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly