lesson_8_flashcards

Question 1

Q

What is the purpose of scaling deep learning from experiment to production?

Answer

A

To transition models from research prototypes to efficient production systems capable of handling real-world workloads.

Question 2

Q

What is data parallelism?

Answer

A

A method where the same model is replicated on multiple devices, and data is split into batches that are processed in parallel.

Question 3

Q

What is model parallelism?

Answer

A

A method where different parts of the model are distributed across multiple devices, used when the model is too large to fit on one device.

Question 4

Q

What are distributed data parallel techniques?

Answer

A

Techniques like PyTorch’s Distributed Data Parallel (DDP) that allow scaling training across multiple GPUs and machines with near-linear efficiency.

Question 5

Q

What is transfer learning for quality estimation?

Answer

A

Applying pretrained models to predict the quality of machine-translated texts, scoring their confidence in being correct.

Question 6

Q

What are hateful memes, and why are they challenging to classify?

Answer

A

Multimodal hate speech examples combining text and images, requiring advanced models to understand context across modalities.

Question 7

Q

What is self-supervised learning in MRI acceleration?

Answer

A

A method to train models using subsampled data without full labels, accelerating MRI scans while maintaining diagnostic quality.

Question 8

Q

What are transformer adapters?

Answer

A

Lightweight modules inserted into transformer layers, allowing task-specific fine-tuning without modifying the entire pretrained model.

Question 9

Q

What is the advantage of using transformer adapters?

Answer

A

They reduce computational costs and mitigate catastrophic forgetting by training only a small subset of parameters.

Question 10

Q

What is Grad-CAM used for?

Answer

A

A visualization technique to highlight important regions in input data that contribute to a model’s decision.

Question 11

Q

What is the purpose of distributed model parallelism?

Answer

A

To enable efficient training of large models by distributing different parts of the model across GPUs on multiple machines.

Question 12

Q

What is the main limitation of traditional BERT-based fine-tuning?

Answer

A

Fine-tuning requires retraining all 350M+ parameters per task, making it computationally expensive and prone to catastrophic forgetting.

Question 13

Q

What is the fastMRI dataset used for?

Answer

A

A dataset for research on accelerating MRI scans using machine learning, providing subsampled and fully sampled data for training and evaluation.

Question 14

Q

What is the hateful memes dataset?

Answer

A

A multimodal dataset requiring reasoning across text and images to classify content, aimed at advancing research in hate speech detection.

Question 15

Q

How does PyTorch’s JIT mode improve performance?

Answer

A

By optimizing code with features like loop unrolling, kernel fusion, and hardware-specific compilation, enabling faster execution.

lesson_8_flashcards

(15 cards)