CO₂ Methodology
EcoAI tracks the carbon savings produced by each cache hit. This page explains exactly how that number is calculated, what sources it draws from, and where the estimates fall short.
The formula
tokens_saved × energy_per_token × grid_carbon_intensity = kg CO₂ savedIn code:
co2SavedKg = (tokensSaved / 1000) * CO2_PER_1K_TOKENS[model]Where CO2_PER_1K_TOKENS[model] already bakes in the grid intensity factor, so a single lookup gives you the full result.
1. Tokens saved
EcoAI counts output tokens on every cache hit — the tokens the provider would have generated if the request had reached the API.
Input tokens are excluded because:
- The prompt is read to compute the cache key regardless of hit or miss.
- Output generation (autoregressive decoding) accounts for the majority of inference FLOPs and therefore energy.
2. Energy per token
Published energy data for LLM inference is sparse. EcoAI's per-model constants are derived from the following sources:
| Source | What it contributes |
|---|---|
| Patterson et al. (2021) Carbon Emissions and Large Neural Networks | Training/inference energy breakdown for transformer models |
| Luccioni et al. (2023) Power Hungry Processing | Per-query energy measurements across LLM sizes |
| IEA Electricity 2024 | Global average grid intensity: 0.45 kgCO₂/kWh |
| Google Environmental Report (2023) | Per-query anchor (~0.3 Wh for a Google Search) |
General ranges used:
| Model tier | Wh per 1k output tokens |
|---|---|
| Large (GPT-4o, Claude Opus, Gemini 2.5 Pro) | 0.002–0.003 Wh |
| Mid-range (Claude Sonnet, Gemini Flash) | 0.001–0.002 Wh |
| Small (GPT-4o-mini, Claude Haiku) | 0.0003–0.0007 Wh |
Per-model constants (kg CO₂ per 1k output tokens)
| Model | kgCO₂ / 1k tokens | Basis |
|---|---|---|
gpt-4o | 0.000030 | Large GPT-4 class, 0.003 Wh/1k × 0.45 kgCO₂/kWh × PUE 1.2 |
gpt-4o-mini | 0.000006 | Small model, ~5× lower than GPT-4o |
gpt-4-turbo | 0.000030 | Same class as GPT-4o |
gpt-3.5-turbo | 0.000006 | Small |
claude-opus-4-8 | 0.000035 | Large MoE-class |
claude-sonnet-4-6 | 0.000020 | Mid-range |
claude-haiku-4-5 | 0.000008 | Small |
claude-3-5-sonnet-20241022 | 0.000020 | Mid-range |
claude-3-opus-20240229 | 0.000035 | Large |
gemini-2.5-pro | 0.000025 | Large |
gemini-2.5-flash | 0.000010 | Efficient mid-range |
gemini-1.5-pro | 0.000025 | Large |
gemini-1.5-flash | 0.000010 | Efficient |
default (unknown model) | 0.000020 | Conservative mid-range fallback |
3. Grid carbon intensity
EcoAI uses 0.45 kgCO₂/kWh — the IEA 2023 global average for electricity generation.
This is intentionally not provider-specific because:
- Providers do not publish real-time per-inference carbon data.
- Providers purchase renewable energy certificates (RECs) that reduce reported emissions but not necessarily physical grid draw at the moment of inference.
- A global average makes the methodology reproducible and independently auditable.
Actual emissions may be lower for providers running on green-heavy grids (e.g. Google's stated 90%+ carbon-free energy hours), or higher in coal-heavy regions.
4. Comparison equivalencies
The dashboard converts CO₂ savings into human-scale comparisons. Reference values:
| Comparison | kg CO₂ per unit | Source |
|---|---|---|
| Google search | 0.0002 kg | Google Environmental Report 2023 |
| Email sent | 0.004 kg | Carbon Trust "Digital Carbon Footprint" |
| Smartphone charge | 0.0022 kg | 5.5 Wh × 0.4 kgCO₂/kWh |
| LED bulb-hour | 0.002 kg | 5 W × 1 h × 0.4 kgCO₂/kWh |
| 1 km driven (avg car) | 0.120 kg | IPCC AR6 passenger vehicle lifecycle average |
Equivalencies are only shown when the computed value is ≥ 0.01 units, to avoid uninformative fractions.
Known limitations
Inference-only. Training costs, datacenter cooling (PUE), and hardware manufacturing are excluded — they are amortised across billions of requests and cannot be attributed to a single cached call.
Output tokens only. Input token energy (KV-cache prefill) is not counted, making all estimates conservative.
No provider-specific metering. Without per-request energy data from OpenAI, Anthropic, or Google, every number here is an estimate. EcoAI will update constants when providers publish better data.
Model versions change. Providers release more efficient model revisions over time. A model ID like
gpt-4omay refer to different underlying hardware across different time periods.Location is not factored in. A cache hit in Norway (near-zero-carbon grid) has different real-world impact than one in Poland (coal-heavy grid). EcoAI uses a global average and cannot know where inference runs.
Goal
The goal is not precision — it is to make the environmental benefit of caching visible and meaningful. Even under conservative assumptions, repeated identical API calls without caching waste real energy.
Suggesting improvements
If you have better data for a specific model, a region-specific grid intensity, or a source we haven't cited, open an issue or pull request:
Last updated: June 2026. Sources reviewed annually or when providers publish updated energy reports.