LLM Training Run Budget Calculator

Quickly scope the cash required to spin up an LLM training or fine-tuning run. Combine GPU runtime, hourly pricing, human-in-the-loop costs, checkpoint storage, and expected egress so you can secure budget approvals before booking capacity.

Total GPU hours

Sum of all accelerator hours across the full training or fine-tuning job (GPU count × runtime).

GPU cost per hour (USD)

On-demand, reserved, or amortised cost per GPU hour in USD.

Data preparation cost (USD)

Labeling, cleaning, evaluation, and prompt engineering labour allocated to this run.

Checkpoint storage (GB)

Leave blank to assume 420 GB of checkpoints and logs; override with your pipeline’s storage footprint.

Storage retention (days)

Leave blank to assume 14 days of retention; extend for compliance or multi-run comparisons.

Storage rate (USD per GB-month)

Leave blank to assume $0.26 per GB-month on premium SSD tiers; adjust for object storage or cold tiers.

Network egress (GB)

Leave blank if checkpoints stay in-region; otherwise enter total outbound GB across collaborators or regions.

Egress rate (USD per GB)

Leave blank to assume $0.09 per GB for public cloud egress; input your provider’s negotiated rate if lower.

Planning aid only; confirm pricing with your cloud provider or hardware vendor before committing spend.

Examples

13B fine-tune using 290 GPU hours at $12/hr, $650 data prep, 420 GB stored for 14 days, 80 GB egress ⇒ $4,188.16 total run budget
7B base training with 512 H100 hours at $21.50/hr, $500 data prep, 600 GB stored for 21 days, no egress ⇒ $11,617.20 required

FAQ

How do I account for spot interruptions?

Increase GPU hours by your expected interruption overhead or add a contingency percentage to the final result.

Can I include inference evaluation costs?

Yes. Add the extra GPU hours and data prep spent on eval suites into the same inputs so the budget covers the whole experiment.

What if I pay in another currency?

Convert your provider’s hourly rate to USD (or your reporting currency) before entering it, then reconvert the result as needed.

Additional Information

GPU hours already include parallelism—double the accelerator count doubles the hours consumed.
Storage multipliers cover checkpoints, logs, and tensorboard data; adjust retention after each sprint.
Add contingency (e.g., +15%) outside the calculator if you want headroom for reruns or spot preemptions.
Blend multiple GPU price tiers by averaging on-demand and reserved rates weighted by expected usage.

Examples

FAQ

How do I account for spot interruptions?

Can I include inference evaluation costs?

What if I pay in another currency?

Additional Information

Related calculators