lesson_8_flashcards
What is the purpose of scaling deep learning from experiment to production?
To transition models from research prototypes to efficient production systems capable of handling real-world workloads.
What is data parallelism?
A method where the same model is replicated on multiple devices, and data is split into batches that are processed in parallel.
What is model parallelism?
A method where different parts of the model are distributed across multiple devices, used when the model is too large to fit on one device.
What are distributed data parallel techniques?
Techniques like PyTorch’s Distributed Data Parallel (DDP) that allow scaling training across multiple GPUs and machines with near-linear efficiency.
What is transfer learning for quality estimation?
Applying pretrained models to predict the quality of machine-translated texts, scoring their confidence in being correct.
What are hateful memes, and why are they challenging to classify?
Multimodal hate speech examples combining text and images, requiring advanced models to understand context across modalities.
What is self-supervised learning in MRI acceleration?
A method to train models using subsampled data without full labels, accelerating MRI scans while maintaining diagnostic quality.
What are transformer adapters?
Lightweight modules inserted into transformer layers, allowing task-specific fine-tuning without modifying the entire pretrained model.
What is the advantage of using transformer adapters?
They reduce computational costs and mitigate catastrophic forgetting by training only a small subset of parameters.
What is Grad-CAM used for?
A visualization technique to highlight important regions in input data that contribute to a model’s decision.
What is the purpose of distributed model parallelism?
To enable efficient training of large models by distributing different parts of the model across GPUs on multiple machines.
What is the main limitation of traditional BERT-based fine-tuning?
Fine-tuning requires retraining all 350M+ parameters per task, making it computationally expensive and prone to catastrophic forgetting.
What is the fastMRI dataset used for?
A dataset for research on accelerating MRI scans using machine learning, providing subsampled and fully sampled data for training and evaluation.
What is the hateful memes dataset?
A multimodal dataset requiring reasoning across text and images to classify content, aimed at advancing research in hate speech detection.
How does PyTorch’s JIT mode improve performance?
By optimizing code with features like loop unrolling, kernel fusion, and hardware-specific compilation, enabling faster execution.