when exporting PyTorch models for deployment with ONNX Runtime
ONNX Export and Validation AI Prompt
This prompt exports a PyTorch model to ONNX, validates graph correctness, compares outputs against PyTorch, and benchmarks ONNX Runtime performance. It is useful when preparing a model for portable or faster inference beyond eager PyTorch.
Export this PyTorch model to ONNX format and validate correctness and performance. 1. Export to ONNX: - Use torch.onnx.export with opset_version=17 (latest stable) - Define input_names, output_names, and dynamic_axes for variable batch size and sequence length - Set do_constant_folding=True for graph optimization - Use dynamo=True (torch.onnx.dynamo_export) for newer models with control flow 2. ONNX graph validation: - onnx.checker.check_model(model) for structural validity - onnxsim (onnx-simplifier): simplify the graph and remove redundant nodes - Visualize with Netron to inspect the computation graph 3. Numerical correctness check: - Run inference with identical inputs through PyTorch and ONNX Runtime - Assert all outputs match to within rtol=1e-3, atol=1e-5 - Test with multiple batch sizes and sequence lengths if dynamic axes are used 4. ONNX Runtime inference: - Create InferenceSession with providers=['CUDAExecutionProvider', 'CPUExecutionProvider'] - Optimize with ort.SessionOptions: graph_optimization_level=ORT_ENABLE_ALL - Enable io_binding for zero-copy GPU inference 5. Performance benchmark: - Compare p50/p95/p99 latency: PyTorch vs ONNX Runtime - Compare throughput at batch sizes 1, 8, 32 - Typical improvement: 1.5–4× speedup on CPU, 1.2–2× on GPU 6. Common export issues and fixes: - Control flow (if/else in forward): use torch.jit.script first - Custom ops: register custom ONNX op or rewrite using supported ops - Dynamic shapes: test with min, typical, and max shapes Return: export script, validation code, numerical correctness tests, and benchmark results.
When to use this prompt
when correctness between PyTorch and ONNX must be verified numerically
when dynamic batch or sequence dimensions are needed
when you want benchmark evidence before switching runtimes
What the AI should return
ONNX export code, validation and numerical correctness tests, and benchmark comparisons between PyTorch and ONNX Runtime.
How to use this prompt
Open your data context
Load your dataset, notebook, or working environment so the AI can operate on the actual project context.
Copy the prompt text
Use the copy button above and paste the prompt into the AI assistant or prompt input area.
Review the output critically
Check whether the result matches your data, assumptions, and desired format before moving on.
Chain into the next prompt
Once you have the first result, continue deeper with related prompts in Model Compression.
Frequently asked questions
What does the ONNX Export and Validation prompt do?+
It gives you a structured model compression starting point for ml engineer work and helps you move faster without starting from a blank page.
Who is this prompt for?+
It is designed for ml engineer workflows and marked as intermediate, so it works well as a guided starting point for that level of experience.
What type of prompt is this?+
ONNX Export and Validation is a single prompt. You can copy it as-is, adapt it, or use it as one step inside a larger workflow.
Can I use this outside MLJAR Studio?+
Yes. The prompt text works in other AI tools too, but MLJAR Studio is the best fit when you want local execution, visible Python code, and reusable notebooks.
What should I open next?+
Natural next steps from here are Compression Pipeline Chain, Knowledge Distillation, Post-Training Quantization.