when training must fit in less GPU memory or run faster
Mixed Precision Training AI Prompt
This prompt adds mixed precision training to a PyTorch workflow using AMP, including bf16 or fp16 selection, GradScaler usage, and simple benchmark comparisons. It focuses on achieving speed and memory gains without destabilizing training.
Implement mixed precision training to reduce memory usage and increase training speed. 1. Automatic Mixed Precision (AMP) with torch.cuda.amp: - torch.autocast context manager for the forward pass: dtype=torch.float16 (Volta/Turing) or torch.bfloat16 (Ampere+) - GradScaler for loss scaling to prevent fp16 underflow - Correct placement: autocast wraps only forward pass, not optimizer step 2. bf16 vs fp16 choice: - fp16: faster on Volta/Turing (V100, T4), but requires loss scaling, more numerically unstable - bf16: preferred on Ampere+ (A100, H100, 4090), no loss scaling needed, same dynamic range as fp32 - Recommendation: use bf16 if GPU supports it, fp16 otherwise 3. Operations to keep in fp32: - Batch normalization running statistics - Loss computation (especially with log operations) - Softmax outputs used as probabilities - torch.nn.functional.cross_entropy computes in fp32 internally by default 4. GradScaler best practices: - Initial scale: 2^16 (default) - scaler.step() replaces optimizer.step() — skips update if gradients have Inf/NaN - scaler.update() adjusts scale dynamically - Check scaler.get_scale() to monitor — if it drops continuously, model has instability issues 5. Expected gains: - Memory reduction: ~40–50% for fp16 - Speed improvement: 1.5–3× on Tensor Core GPUs - Verify: run 1 epoch with and without AMP and compare loss curves Return: complete AMP training loop with GradScaler, bf16/fp16 selection logic, and before/after benchmark code.
When to use this prompt
when you want a correct AMP implementation with bf16 or fp16 logic
when GradScaler behavior and safe fp32 exceptions matter
when you need before-and-after speed and memory comparisons
What the AI should return
A complete AMP-enabled training loop with precision selection logic, GradScaler integration, and benchmark code comparing AMP against full precision.
How to use this prompt
Open your data context
Load your dataset, notebook, or working environment so the AI can operate on the actual project context.
Copy the prompt text
Use the copy button above and paste the prompt into the AI assistant or prompt input area.
Review the output critically
Check whether the result matches your data, assumptions, and desired format before moving on.
Chain into the next prompt
Once you have the first result, continue deeper with related prompts in Optimization.
Frequently asked questions
What does the Mixed Precision Training prompt do?+
It gives you a structured optimization starting point for ml engineer work and helps you move faster without starting from a blank page.
Who is this prompt for?+
It is designed for ml engineer workflows and marked as beginner, so it works well as a guided starting point for that level of experience.
What type of prompt is this?+
Mixed Precision Training is a single prompt. You can copy it as-is, adapt it, or use it as one step inside a larger workflow.
Can I use this outside MLJAR Studio?+
Yes. The prompt text works in other AI tools too, but MLJAR Studio is the best fit when you want local execution, visible Python code, and reusable notebooks.
What should I open next?+
Natural next steps from here are DataLoader Optimization, Flash Attention Integration, Full Optimization Chain.