How accurate are ByteThirst's AI environmental estimates?

ByteThirst displays mid-point estimates computed from documented inputs: Google's 2025 published 0.30 Wh/query baseline, U.S. grid water intensity (0.94 mL/Wh), model-tier multipliers, and per-platform coefficients. Every displayed number is prefixed 'est.' and links to this methodology page where every coefficient and source is published. Inherent uncertainty in the inputs — token estimation, hardware variance, regional water intensity, reasoning-model internal tokens — is documented in the Known Limitations section; ByteThirst does not display per-query uncertainty bands, only the baseline mid-point estimate.

Is it true that ChatGPT uses 500 mL of water per prompt?

The 500 mL figure comes from a 2023 UC Riverside paper measuring GPT-3 on 2022 hardware, including full data-center cooling. Modern inference on current hardware is roughly 20–50× more efficient. ByteThirst's per-prompt estimates fall closer to 0.3 mL for typical queries.

Why do reasoning models like o3 and Claude Thinking cost more?

Reasoning models generate far more output tokens than standard chat models — often 10–50× more — because they expand internal chains of thought before responding. ByteThirst applies a model-tier multiplier so reasoning-model prompts show higher water, energy, and CO₂ estimates.

ByteThirst™ QueryWeight™ Calculation Methodology

How we estimate the water, energy, and CO₂ cost of every AI interaction

Last updated: May 7, 2026 · Version 2.1

What Is a QueryWeight?

A QueryWeight is ByteThirst’s term for the estimated environmental cost of a single AI interaction. It combines three metrics: estimated water consumption (mL), estimated energy use (Wh), and estimated carbon emissions (g CO₂). All figures are estimates based on publicly available research — not direct measurements of actual resource consumption.

Summary

ByteThirst is a Chrome browser extension that estimates the water consumption, energy use, and carbon emissions of your AI interactions across 14 platforms: ChatGPT, Claude, Gemini, Copilot, Perplexity, Poe, You.com, Mistral, HuggingChat, Figma AI, Lovable.dev, Bolt.new, NotebookLM, and Google AI Studio. All values are estimates, not precise measurements. Every estimate is presented as a range (low / mid / high) to communicate the significant uncertainty inherent in these calculations. We anchor our model to the best available public measurements and apply scaling factors for query complexity and model size.

Calculation Pipeline

Input: Character count → Token Estimation → Energy (Wh) → Water (mL) → CO₂ (g)

Step 1: Token Estimation

ByteThirst does not have direct access to the internal tokenizers used by each AI platform. Instead, we estimate token counts by dividing the character count of your input and the model's output by a platform-specific characters-per-token ratio. We calibrate platform-specific characters-per-token ratios against each platform's publicly available tokenizer tools and documentation. Ratios range from approximately 3.8 to 4.2 depending on the tokenizer architecture (BPE vs. SentencePiece). Platforms that route to multiple underlying models (Perplexity, Poe, HuggingChat) use a default ratio that is adjusted when the specific model can be detected.

These ratios are calibrated for English text. Other languages—particularly CJK languages, Arabic, and Hindi—may have significantly different characters-per-token ratios. We plan to add language-specific adjustments in a future update.

Multi-Model and Aggregator Platforms

Several supported platforms route queries to different underlying models depending on user settings or query type:

Copilot (formerly Bing Chat) uses Microsoft-hosted variants of OpenAI's GPT-4 family. The extension activates on both copilot.microsoft.com and bing.com/chat. We apply the same token ratio and energy baseline as ChatGPT.
Perplexity uses multiple LLMs for different query types. When the specific model is detectable from the page, we apply the corresponding model tier multiplier. Otherwise, we use the standard tier (1.0×) as a conservative default.
Poe allows users to choose from multiple model providers (GPT-4, Claude, Gemini, and others). The extension attempts to detect the active model from page elements and apply the appropriate tier multiplier and token ratio. If detection fails, the standard tier and default ratio are used.
NotebookLM is Google’s AI-powered research and note-taking tool, also powered by Gemini models. We apply the same SentencePiece tokenizer ratio and Gemini model tier multipliers.

Model detection on these platforms is heuristic and depends on DOM elements that may change without notice. See Limitation #2 below for details on detection uncertainty.

AI Code Builder Platforms

AI code builders represent a new modality of AI interaction estimated by ByteThirst. Unlike text-based chat, code generation sessions use a split-panel architecture: a chat input generates large volumes of structured code output. A single coding session can produce 20,000–200,000 output characters, driving dramatically higher resource consumption than text-based interactions.

According to Couch (January 20, 2026), a typical AI coding agent session consumes approximately 41 Wh of energy — roughly 130× more than a standard ChatGPT text query. Couch's measured per-token rates are 200 Wh/MTok for input tokens and 990 Wh/MTok for output tokens, implying a 4.95× output-to-input ratio for code generation (vs. the 15× weighting we apply to text-chat output tokens). ByteThirst rounds this to a 5× output weight for code-gen platforms.

For code generation platforms, we apply this reduced output token weight (5×) calibrated against the Couch benchmark of approximately 41 Wh per typical coding agent session. The standard text-chatbot weighting overestimates energy for code generation because these sessions produce substantially more output tokens than typical conversational queries.

Currently estimated code builder platforms:

Lovable.dev — Full-stack web application generation from natural language prompts. Default tier: large.
Bolt.new — Uses Claude 3.5/3.7 Sonnet via StackBlitz WebContainers. Client-side compute (WebContainers runtime) is not included in ByteThirst estimates as it runs locally in the user’s browser, not on remote inference servers.
Google AI Studio — Developer platform for building with Gemini models. Classified as code-gen because interactions tend to involve long system prompts, tool definitions, and multi-turn agentic sessions, making its output profile closer to code-gen agents than standard chat.

Cache Token Handling

Some AI platforms employ prompt caching, where previously seen input tokens are served from cache rather than reprocessed by the full model. Cached tokens consume significantly less compute. ByteThirst applies a significantly reduced multiplier for cache-read tokens when detectable, reflecting the substantially lower energy consumption of cache hits compared to full inference passes.

Extended Thinking Tokens

Models with extended thinking capabilities (such as Claude’s extended thinking mode and OpenAI’s o-series reasoning models) generate internal reasoning tokens that consume inference compute but are not always visible to the user. ByteThirst weights detected thinking tokens consistently with standard output tokens, as they consume equivalent inference compute. When a platform uses both extended thinking and a reasoning model tier multiplier, only the higher factor is applied to avoid double-counting.

Step 2: Energy Estimation

Energy consumption per query is the most studied and most variable component of our pipeline. We surveyed every major public source available as of early 2026 to anchor our baseline estimate.

Source	Model	Energy per Query	Date	Notes
Google (official)	Gemini median text prompt	0.24 Wh	Aug 2025	Most transparent industry disclosure. Includes idle capacity and cooling overhead.
OpenAI (Altman)	ChatGPT average query	0.34 Wh	Aug 2025	Self-reported, less methodological detail.
Epoch AI (independent)	GPT-4o, 500 output tokens	0.30 Wh	Feb 2025	Based on H100 GPU compute analysis. Short query baseline.
Epoch AI (independent)	GPT-4o, ~7,500 input words	2.5 Wh	Feb 2025	Long context query. Demonstrates 8x range based on input length.
Jegham et al. (arXiv)	GPT-4o short query	0.42 Wh ± 0.13	May 2025	Academic benchmark with uncertainty bounds.

Our baseline

We use 0.30 Wh as the baseline energy cost for a standard query of approximately 100 input tokens + 500 output tokens = 600 total tokens. This is anchored to the Epoch AI independent estimate for GPT-4o, which falls in the middle of the industry self-reports (Google's 0.24 Wh and OpenAI's 0.34 Wh). The strongest first-party primary source is Google's 2025 preprint Measuring the environmental impact of delivering AI at Google Scale (arXiv:2508.15734), which reports that the “median Gemini Apps text prompt uses 0.24 watt-hours (Wh) of energy.” ByteThirst's 0.30 Wh anchor is the ceiling-conservative figure for a standardized 100/500-token query, vs. Google's 0.24 Wh median across all real Gemini Apps text prompts.

Scaling by query size

Output tokens require significantly more compute than input tokens because each output token requires a full forward pass through the model, while input tokens are processed in parallel during the prefill stage. We apply a research-based weighting factor to normalize compute cost across varying query lengths, derived from published inference cost analyses comparing prefill and decode compute costs.

Energy is then scaled linearly relative to a standard query's effective token count. A query with twice the effective tokens uses approximately twice the energy.

Model tier multipliers

Different models within each platform vary dramatically in compute requirements. We classify models into five tiers:

Tier	Example Models	Relative Cost	Rationale
Small	GPT-4o-mini, Claude Haiku, Gemini Flash	Significantly below baseline	Smaller parameter count, lower compute.
Standard	GPT-4o, Claude Sonnet, Gemini Pro	Baseline	Most common consumer-facing models.
Large	GPT-4.1, Claude Opus, Gemini Ultra	Several times baseline	Largest models with highest compute requirements.
Reasoning	o1, o3, o4-mini, Claude extended thinking	Substantially higher	These models generate extensive internal chain-of-thought tokens before producing a response.
Image generation	DALL-E, Gemini image gen	Highest tier	Per Luccioni et al. (2023), image generation consumes substantially more energy per request than text inference.
Code generation	Lovable.dev sessions, Bolt.new	Calibrated separately	Sessions produce 20K–200K output characters. See “AI Code Builder Platforms” in Step 1.

Energy estimate

ByteThirst displays a single baseline mid-point estimate for every query. Internally, the calculation engine considers low / mid / high conversion-factor scenarios (base × 0.6 / 1.0 / 1.8) to sense-check the result against best-case and conservative bounds, but the user-facing display is the mid-point value. The published coefficients above and the Known Limitations section below disclose the inherent uncertainty in the inputs without imposing per-query bands on the rendered number.

Step 3: Water Estimation

Data centers consume water primarily for cooling. We convert energy estimates to water estimates using a water-intensity ratio (milliliters of water per watt-hour of energy consumed).

Source	Ratio (mL/Wh)	Notes
Google (official)	1.08 mL/Wh	Derived: 0.26 mL water per 0.24 Wh query. Comprehensive overhead included.
OpenAI (Altman, implied)	0.94 mL/Wh	Derived: 0.32 mL water per 0.34 Wh query.
Data Center Knowledge (2026)	1.80 mL/Wh	Verbatim: “current industry average Water Usage Effectiveness (WUE) is 1.8 liters per kWh.” Reflects on-site cooling water at evaporative-cooled hyperscaler facilities.

Our range: Low: 0.50 mL/Wh (dry climate with air cooling), Mid: 0.94 mL/Wh (industry average derived from OpenAI disclosure), High: 1.80 mL/Wh (industry-average WUE at evaporative-cooled hyperscaler facilities, per Data Center Knowledge 2026).

A note on cooling-architecture mix. ByteThirst's water-per-Wh range is a population-weighted average across cooling architectures (free-air-cooled, glycol, evaporative). Hyperscalers report Water Usage Effectiveness (WUE) measured at the facility — values vary from near 0 mL/Wh in air-cooled regions (e.g., Iceland) to >1.8 mL/Wh in evaporative-cooled hot regions. ByteThirst's range brackets this distribution; LOW reflects free-air best-case, MID is industry-population average, HIGH reflects evaporative-cooled worst case.

A note on viral claims

The widely cited UC Riverside study (Li et al., 2023) estimated that ChatGPT consumes approximately 519 mL of water per 100 words of output—roughly 52 mL per short query. This figure is approximately 1,000× higher than the industry self-reports from Google and OpenAI. The discrepancy arises because the UC Riverside methodology includes the full lifecycle water footprint of electricity generation (so-called "off-site" or "upstream" water), including water consumed at power plants, in fuel extraction, and in the broader energy supply chain. By contrast, Google's and OpenAI's figures report only the direct ("on-site") water consumed at the data center for cooling. Both approaches are valid for different purposes, but they measure fundamentally different things. ByteThirst uses the direct water consumption methodology because it represents the water physically used at data centers and is the figure most comparable across providers.

Step 4: CO₂ Estimation

We estimate carbon emissions by multiplying energy consumption by a grid carbon intensity factor (grams of CO₂ emitted per watt-hour of electricity consumed).

Source	Intensity	Notes
EPA eGRID (2023)	0.39 kg CO₂/kWh	US national average, location-based.
Google (location-based)	0.09 gCO₂e per Gemini query	Based on actual grid mix at data center locations.
Google (market-based)	0.03 gCO₂e per Gemini query	Includes renewable energy certificate purchases.
IEA, Energy and AI (2024 data)	0.434 g/Wh (implied global average)	Verbatim: “Global data centers consumed around 415 TWh of electricity in 2024.” Implied global average: 180 Mt CO₂ / 415 TWh = 0.434 g/Wh.

Our range: Low: 0.20 g/Wh (reflects grids with significant renewable penetration), Mid: 0.39 g/Wh (US national average from EPA eGRID; sits within 11% of the IEA's 0.434 g/Wh implied global average for 2024 data center electricity), High: 0.60 g/Wh (coal-heavy grids or regions with older infrastructure).

We use location-based emissions rather than market-based emissions. While companies like Google and Microsoft purchase renewable energy certificates (RECs) to offset their electricity usage, location-based accounting reflects the actual carbon intensity of the grid where the data center operates. This is more representative of the real-world emissions impact, since RECs do not necessarily reduce the physical carbon intensity of the electricity consumed at the point of use.

Known Limitations & Uncertainty

Token estimation is approximate. Our character-to-token ratios are averages for English text. Actual tokenization varies by language, content type (code vs. prose), and specific model version. Errors of 10–20% in token estimation are possible.
Model tier detection is heuristic. ByteThirst infers the active model from DOM elements on each AI platform's interface. If a platform changes its UI, model detection may temporarily misclassify the model tier until we update the extension.
Energy-per-token varies widely. The energy cost of inference depends on GPU type (H100 vs. A100 vs. TPUv5), batch size, quantization level, and server utilization. Our baseline assumes mid-range conditions, but actual energy consumption for any single query could be 2–3× higher or lower.
Water consumption depends on local climate and cooling technology. A data center in Iowa using evaporative cooling will consume significantly more water per watt-hour than a data center in Finland using free air cooling. We cannot determine which data center serves any individual query, so we use an industry-average ratio.
Cached and short-circuited responses are not detected. Some queries may be served from cache or routed to smaller models, consuming far less energy than our estimates suggest. We have no way to detect this from the client side.
Reasoning model uncertainty is high. Models like o1, o3, and o4-mini generate internal chain-of-thought tokens that are not visible to the user. The number of internal tokens can vary from 2× to 50× the visible output length. Our multiplier is a conservative midpoint, but individual queries may vary significantly.
All constants are point-in-time. The energy efficiency of AI inference is improving rapidly. Our constants are based on data available as of early 2026 and will be updated as new measurements are published.

We address this uncertainty through transparency rather than per-query bands. Every displayed estimate is prefixed "est." and is the baseline mid-point from documented inputs; every number on every surface (extension popup, CLI, web dashboard, marketing pages) links back to this methodology page where every coefficient, source, and limitation is published. The honesty mechanism is the methodology link, not a numeric range adjacent to the estimate. Users who want to interrogate the assumptions follow the link; users who want a single readable number get one.

What Our Estimates Include and Do Not Include

Included in our estimates:

Estimated energy consumed by the AI model's inference computation (GPU/TPU processing)
Estimated direct water consumed for data center cooling during inference
Estimated CO₂ emitted from electricity generation powering inference hardware

Not included in our estimates:

Energy consumed by your device (computer, phone, monitor)
Energy consumed by network transmission (routers, ISPs, CDNs)
Water used in manufacturing AI chips or server hardware (embodied water)
Carbon emissions from manufacturing, shipping, or disposing of hardware (embodied carbon)
Energy or water consumed during model training (only inference is estimated)
Upstream water used in the energy supply chain (power plant cooling, fuel extraction)
Energy consumed by non-inference server operations (load balancing, logging, storage)

Our estimates represent the direct operational footprint of AI inference only. The full lifecycle impact of AI usage — including training, hardware manufacturing, and upstream energy production — is substantially higher but falls outside what can be reasonably estimated on a per-query basis from a browser extension.

Individual vs. Cumulative Impact

A single AI query has a very small environmental footprint — typically a fraction of a milliliter of water and a fraction of a watt-hour of energy. At the individual query level, these amounts are negligible.

ByteThirst aggregates these small amounts over time to show daily and weekly totals. The purpose is awareness of cumulative patterns, not to suggest that any individual query causes meaningful environmental harm. With hundreds of millions of AI queries processed globally each day, the aggregate resource consumption is significant — but that aggregate is made up of individually tiny contributions.

We believe informed users make better choices, and understanding scale is the first step.

Unit Conversions

ByteThirst displays standard volume conversions alongside milliliter values: teaspoons (1 tsp = 4.93 mL), tablespoons (1 tbsp = 14.79 mL), fluid ounces (1 fl oz = 29.57 mL), and cups (1 cup = 236.59 mL). Energy is displayed in Wh and kWh. Carbon is displayed in g and kg. No real-world comparisons or analogies are used — only standard unit conversions.

Comparison with Other Estimates

To validate our model, we compare our mid-range estimate for a standard text query against published per-query figures from other sources:

Source	Per-query water estimate	Our mid estimate	Ratio
Google (official, Gemini)	0.26 mL	0.28 mL	~1.1×
OpenAI (Altman, ChatGPT)	0.32 mL	0.28 mL	~0.9×
UC Riverside (Li et al.)	~52 mL	0.28 mL	~0.005×

Our mid estimate aligns closely with the industry self-reports from Google and OpenAI, falling within 10% of both figures. The UC Riverside figure is not directly comparable due to the inclusion of upstream lifecycle water, as discussed in Step 3 above.

Model Efficiency Landscape

Independent research published in 2025–2026 provides the most detailed cross-model environmental comparisons available. These benchmarks help explain why model choice is the single biggest factor in your AI environmental footprint.

Model	Est. Energy per Query	Est. Water per Query	Source
Gemini Flash	0.24 Wh	0.26 mL	Google (2025)
GPT-4o (short query)	0.42 Wh	~0.40 mL	Jegham et al. (2025)
GPT-4o (long query)	1.79 Wh	~1.68 mL	Jegham et al. (2025)
GPT-5 (medium response)	~18–19 Wh	~17–18 mL	Jegham/URI (2025)
DeepSeek-R1 (reasoning)	~29–34 Wh	~27–32 mL	Jegham et al. (2025)

The most energy-intensive models (reasoning models like o3 and DeepSeek-R1) consume over 65 times more energy than the most efficient models. ByteThirst captures this range through its model tier multiplier system and effective token scaling.

Choosing a smaller or more efficient model is the single biggest action a user can take to reduce their AI environmental footprint. Switching from a reasoning model to a lightweight model like Gemini Flash can reduce the environmental cost of a query by an order of magnitude or more.

Note: Independent estimates (Jegham et al.) may differ from vendor self-reports (Google, OpenAI) due to methodology differences. Vendor measurements capture full-stack production overhead including cooling and idle capacity, while independent benchmarks typically estimate GPU-level compute only. Both approaches are valid; we present them side by side for transparency. Gemini Flash's efficiency advantage partly reflects Google's custom Ironwood TPU hardware, which is 30× more power-efficient than Google's first Cloud TPU from 2018.

Efficiency Is Improving Rapidly

AI inference efficiency is improving at an extraordinary pace. Google documented a 33× reduction in energy per Gemini prompt and a 44× reduction in carbon emissions over a single 12-month period (May 2024 to May 2025), achieved through software optimization, model right-sizing, and custom hardware (Ironwood TPU, which is 30× more power-efficient than Google's first Cloud TPU from 2018).

This means ByteThirst's estimates are point-in-time snapshots. As AI providers continue to optimize their inference infrastructure, the environmental cost per query will continue to decrease. We will recalibrate our constants as new measurements are published.

The direction of AI efficiency is strongly positive. As hardware generations advance (Google's Ironwood TPU is 30× more power-efficient than the first Cloud TPU from 2018) and software optimizations compound, ByteThirst's per-query estimates will trend downward over time. We view this as encouraging: the industry is actively reducing the environmental cost of AI inference, and estimating that progress over time is part of what ByteThirst is designed to do.

Source Citations

Google, "Environmental Report: AI and Energy Use" (August 2025)
Altman, S., "AI and Energy" blog post, OpenAI (August 2025)
Epoch AI, "Estimating the energy consumption of LLM inference" (February 2025)
Jegham, N. et al., "Energy Consumption of Large Language Models: A Systematic Benchmark" arXiv (May 2025)
Luccioni, A. et al., "Power Hungry Processing: Watts Driving the Cost of AI Deployment?" FAccT (2023)
US EPA, "eGRID Summary Tables" (2023 data)
Li, P. et al., "Making AI Less Thirsty" UC Riverside (2023)
SemiAnalysis, "Inference Cost Analysis" (2024)
Couch, S.P. (2026), "Electricity use of AI coding agents" — Per-token energy rates for agentic AI coding sessions
Lovable.dev product documentation — AI code builder architecture and consumption models
StackBlitz/Bolt.new documentation — WebContainers architecture; Claude 3.5 Sonnet integration
Google (2025), "Measuring the Environmental Impact of Delivering AI at Google Scale" (arXiv:2508.15734) — Full-stack production methodology; 0.24 Wh per Gemini prompt; 33× energy / 44× carbon reduction
Jegham, N. et al. (2025), "How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference" (arXiv:2505.09598) — Cross-model benchmarks across 30 LLMs; infrastructure-aware methodology

Invitation for Peer Review

We welcome corrections, updated data, and methodological improvements from researchers, engineers, and anyone with domain expertise. If you spot an error, have access to better measurements, or can suggest a more rigorous approach to any step in our pipeline, please reach out at hello@bytethirst.com. We will credit all contributors who help improve the accuracy of ByteThirst's estimates.

Changelog

Date	Change
May 7, 2026	v2.1 — Widened `WATER_PER_WH_HIGH` from 1.2 to 1.8 mL/Wh to bracket evaporative-cooled hyperscaler WUE per Data Center Knowledge 2026 industry-average benchmark. Added narrative footnote on water cooling-architecture mix scoping. Citation refreshes: energy anchor → Google arXiv:2508.15734 (Aug 2025); code-gen output-weight → Couch 2026-01-20 with full URL; CO₂ grid-average → IEA 2024 (415 TWh, 180 Mt). All other constants unchanged.
March 3, 2026	v2.0 — Introduced QueryWeight™ terminology. Expanded from 13 to 14 platforms: added Google AI Studio and NotebookLM. Added “What Is a QueryWeight?” section. Added Cache Token Handling section (reduced multiplier for cache reads). Added Extended Thinking Tokens section (output weight for thinking tokens, no double-counting with reasoning multiplier). Updated all compliance language to estimation framing.
February 28, 2026	v1.3 — Expanded from 7 to 13 platforms. Added “AI Code Builder Impact Methodology” section with code generation calibration (Couch 2026). Added Mistral, HuggingChat, Figma AI, Lovable.dev, Bolt.new, and NotebookLM platform ratios. Added “Model Efficiency Landscape” section with cross-model benchmarks (Jegham et al. 2025, Google 2025). Added “Efficiency Is Improving Rapidly” section citing 33×/44× reduction data and Ironwood TPU 30× efficiency. Expanded source citations from 8 to 13. Replaced real-world equivalents with standard unit conversions only.
February 17, 2026	v1.1 — Added token estimation ratios for Copilot, Perplexity, Poe, and You.com. Added multi-model platform handling section. Added “What Our Estimates Include and Do Not Include” section. Added “Individual vs. Cumulative Impact” section. Added “Real-World Equivalents” section. Consolidated Bing Chat under Copilot. Updated FTC Green Guides compliance language.
February 15, 2026	v1.0 — Initial methodology published