lesson_8_flashcards

1
Q

What is the purpose of scaling deep learning from experiment to production?

A

To transition models from research prototypes to efficient production systems capable of handling real-world workloads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is data parallelism?

A

A method where the same model is replicated on multiple devices, and data is split into batches that are processed in parallel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is model parallelism?

A

A method where different parts of the model are distributed across multiple devices, used when the model is too large to fit on one device.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are distributed data parallel techniques?

A

Techniques like PyTorch’s Distributed Data Parallel (DDP) that allow scaling training across multiple GPUs and machines with near-linear efficiency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is transfer learning for quality estimation?

A

Applying pretrained models to predict the quality of machine-translated texts, scoring their confidence in being correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are hateful memes, and why are they challenging to classify?

A

Multimodal hate speech examples combining text and images, requiring advanced models to understand context across modalities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is self-supervised learning in MRI acceleration?

A

A method to train models using subsampled data without full labels, accelerating MRI scans while maintaining diagnostic quality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are transformer adapters?

A

Lightweight modules inserted into transformer layers, allowing task-specific fine-tuning without modifying the entire pretrained model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the advantage of using transformer adapters?

A

They reduce computational costs and mitigate catastrophic forgetting by training only a small subset of parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Grad-CAM used for?

A

A visualization technique to highlight important regions in input data that contribute to a model’s decision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the purpose of distributed model parallelism?

A

To enable efficient training of large models by distributing different parts of the model across GPUs on multiple machines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the main limitation of traditional BERT-based fine-tuning?

A

Fine-tuning requires retraining all 350M+ parameters per task, making it computationally expensive and prone to catastrophic forgetting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the fastMRI dataset used for?

A

A dataset for research on accelerating MRI scans using machine learning, providing subsampled and fully sampled data for training and evaluation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the hateful memes dataset?

A

A multimodal dataset requiring reasoning across text and images to classify content, aimed at advancing research in hate speech detection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does PyTorch’s JIT mode improve performance?

A

By optimizing code with features like loop unrolling, kernel fusion, and hardware-specific compilation, enabling faster execution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly