How to Calculate LLM Inference Carbon Intensity

Large language models (LLMs) now underpin search co-pilots, productivity assistants, and customer-support workflows. Each inference may look inexpensive, yet the GPU clusters behind those experiences draw substantial electricity, and the carbon intensity of that electricity depends on geography, procurement strategy, and operational discipline. Quantifying grams of CO2 equivalent per 1,000 generated tokens lets sustainability teams benchmark deployments, compare hosting regions, and justify investments in efficiency or renewable supply contracts.

This walkthrough formalises the calculation so engineering, finance, and ESG stakeholders can rely on a single methodology. We translate metered power data and workload telemetry into emissions intensity, highlight instrumentation requirements, validate the results against physics and compliance expectations, and close with an embedded calculator that matches the standalone LLM Inference Carbon Intensity Calculator. Pair it with the AI Inference Cost Calculator when you need to present both financial and sustainability metrics in the same executive review.

Definition and reporting scope

LLM inference carbon intensity expresses the greenhouse-gas emissions attributable to serving generated tokens. We model it as kilograms of CO2 equivalent (kg CO2e) emitted per 1,000 tokens produced over a defined observation window. The numerator captures operational emissions from electricity used by GPUs, CPUs, networking, and supporting infrastructure; the denominator aggregates tokens observed at the service edge or logging layer. The metric sits between pure efficiency (tokens per joule) and total footprint (kg CO2e per month), making it ideal for cross-regional comparisons and sustainability disclosures aligned with the Greenhouse Gas Protocol's Scope 2 guidance.

Decide whether you are reporting market-based emissions (reflecting renewable energy certificates and power purchase agreements) or location-based emissions (reflecting the grid mix without procurement adjustments). The same workflow applies in both cases; only the carbon intensity factor changes. Document the reporting boundary explicitly—most teams align it with the inference microservice plus any shared vector database or routing fabric required to deliver tokens to users.

Variables, symbols, and units

The calculation relies on metered or observed quantities that must share consistent units. Use SI units internally even if you publish imperial summaries.

  • P – Average electrical power draw of the inference cluster during the observation window, measured in kilowatts (kW). Include GPUs, CPUs, networking, and storage nodes that serve the workload.
  • PUE – Power usage effectiveness (dimensionless). Multiply IT load by PUE to account for facility overhead such as cooling and power delivery. If you cannot meter PUE directly, use the data center's monthly average.
  • Igrid – Grid carbon intensity in kilograms of CO2e per kilowatt-hour (kg/kWh). Choose location-based or market-based factors depending on your disclosure framework.
  • R – Average requests served per hour (requests/h) over the observation window.
  • T – Average total tokens per request (tokens/request), including prompt, completion, and any streamed tool responses.
  • S – Optional renewable supply share expressed as a fraction of total electricity matched with renewable energy certificates or power purchase agreements (0 ≤ S < 1).

Some teams maintain a richer telemetry pipeline that distinguishes prompt and completion tokens. As long as you sum them consistently, the formula remains valid. When traffic is bursty, compute R and T as volume-weighted averages over the same interval used to measure P.

Primary formula

First convert average power into energy and emissions per hour. Adjust the grid factor for any renewable supply share, then normalise by tokens generated:

Effective carbon factor: Ieff = Igrid × (1 − S)

Energy per hour: Ehour = P × PUE (kWh)

Emissions per hour: Chour = Ehour × Ieff (kg CO2e)

Tokens per hour: Qhour = R × T (tokens)

LLM inference carbon intensity (kg CO2e per 1,000 tokens):

CI1k = (Chour ÷ Qhour) × 1000

If your instrumentation produces energy in watt-hours instead of kilowatt-hours, divide by 1,000 before multiplying by the carbon factor. To present the result in grams CO2e per 1,000 tokens, multiply CI1k by 1,000. The embedded calculator reports both units for clarity.

Step-by-step workflow

Step 1: Capture power and PUE data

Pull IT power from rack-level PDUs, GPU chassis telemetry, or cloud-provider usage exports. Align timestamps with the inference workload; fifteen-minute intervals work well. Obtain PUE from your facility operator. If you operate in the cloud, request published PUE figures for the relevant availability zone or substitute 1.2 for modern hyperscale facilities when finer data are unavailable.

Step 2: Extract workload throughput

Compute requests per hour and tokens per request from application logs or observability dashboards. Standardise tokenisation—mismatched encodings can skew averages by 5–10%. When caching is active, count only tokens actually generated by the model; cached responses should be excluded because they consume negligible energy.

Step 3: Determine carbon intensity factors

Market-based emissions require sourcing supplier-specific factors or renewable contract attestations. Location-based reporting uses grid averages such as EPA eGRID, European ENTSO-E data, or national inventories. Convert any gCO2e/kWh values to kg CO2e/kWh by dividing by 1,000. Document the data vintage and whether factors represent marginal or average emissions.

Step 4: Run the calculation

Multiply P by PUE to obtain total facility power attributable to the workload. Multiply by the adjusted carbon factor to get kg CO2e per hour. Divide by tokens per hour and scale by 1,000 to express the result per 1,000 tokens. Maintain a computation log—spreadsheet, notebook, or emissions platform—so auditors can trace every assumption.

Step 5: Report and contextualise

Pair CI1k with supporting metrics: monthly carbon totals, renewable coverage, and intensity deltas against previous quarters. If you already track training emissions via the GPU training time and cost methodology, include both training and inference intensities to present a full lifecycle story.

Validation and quality control

Benchmark the output against engineering expectations. Contemporary 7B–13B parameter models hosted on A100-class hardware typically yield 10–40 grams CO2e per 1,000 tokens depending on region and utilisation. If your result falls outside this range, revisit each input: unusually low values may indicate that cached responses were counted as generated tokens, while high values might stem from including offline batch jobs in the power total.

Perform sensitivity checks by varying each driver ±10% to understand which levers matter most. Power draw and grid intensity usually dominate. Validate renewable shares against procurement ledgers to ensure offsets are not double-counted. Finally, reconcile the implied hourly energy consumption with the facility's energy management system or cloud billing exports; discrepancies greater than 5% warrant investigation.

Limits and interpretation

CI1k assumes a steady workload during the measurement window. Inference fleets that scale dynamically may see carbon intensity drift as idle capacity increases. Capture multiple intervals—peak, shoulder, and overnight—to characterise the full envelope. The metric also excludes embodied emissions from hardware manufacturing; keep those in a separate capital inventory if stakeholders request cradle-to-grave accounting.

When comparing providers or regions, normalise for model quality. Higher-quality models might need fewer tokens to satisfy a task, indirectly lowering total emissions. Combine CI1k with business KPIs (e.g., resolved tickets, search sessions) to express carbon per unit of customer value, and refresh the analysis whenever you roll out prompt optimisations or hardware upgrades.

Embed: LLM inference carbon intensity calculator

Use the embedded tool to execute the workflow above with consistent rounding, optional PUE defaults, and renewable adjustments. It mirrors the standalone calculator and exports results in kilograms and grams per 1,000 tokens.

LLM Inference Carbon Intensity Calculator

Estimate carbon emissions intensity for LLM inference by combining workload throughput with facility efficiency and grid factors.

Sum of prompt and completion tokens served per request.
Average sustained request volume across the hour.
Combined IT load for GPUs, CPUs, and networking delivering the workload.
Blended marginal emissions factor for the facility location.
Defaults to 1.2 when left blank to reflect modern data center efficiency.
Defaults to 0%. Values above 1 are treated as percentages of delivered energy.

Informational model; validate with audited energy and emissions inventories before public reporting.