Improve calculations and display for energy usage.#156
Improve calculations and display for energy usage.#156alexispurslane wants to merge 13 commits intomcowger:mainfrom
Conversation
alexispurslane
commented
Apr 11, 2026
- Revised the inference energy estimator to accept the model architecture and gpu as arguments.
- Added an accordion form to the model alias modal that lets the user:
- fetch model architecture, size, and default dtype information from HuggingFace (safetensors + config.json info) and estimates active parameters and total parameters from that
- edit those values (dtype gets a dropdown)
- save those values
- fetch those values when previously stored and edit them again
- Added a dropdown to a provider to choose the default type of GPU they have. Also saved
- Recalculates the energy usage of all past requests when a model's architecture is updated.
- Shows the amount of time in seconds you could've streamed Netflix for the same amount of energy as your prompts for the give hour/day/week/month on the Usage page
- Also compares energy usage rate between your model usage and Netflix
- Also added an energy usage over time time series graph to the usage page.
feat: Support the new Synthetic quota system
Adds a new `vision_fallthrough_model` column to `request_usage` (both SQLite and PostgreSQL) to capture which descriptor model was used when `use_image_fallthrough` is triggered. The model name is now stored through all success/failure attempt paths so users can filter and analyze fallthrough performance per model in dashboards and logs. Closes mcowger#143
- Add visionFallthroughModel to the UsageRecord type in api.ts - Show fallthrough model name as a third line in the Model column (ScanSearch icon + model name, with copy button on hover) - Update the ScanSearch icon tooltip to include the model name Closes mcowger#143
…uests Previously the child request to the image description model was dispatched internally and never logged. Now VisionDescriptorService saves a full usage record (tokens, cost, duration, provider, model) after each descriptor dispatch, marked with isDescriptorRequest=true so it appears in the Logs UI alongside the parent request. Closes mcowger#143
Two test suites that would have caught the bugs fixed in prior commits: 1. Dispatcher: verifies visionFallthroughModel is included in the recordAttemptMetric metadata when image fallthrough triggers, and absent when it does not. 2. VisionDescriptorService: verifies that usageStorage.saveRequest is called for the child descriptor request on success (with correct provider, model, and token counts), on dispatch failure (with responseStatus='error'), and is skipped when no usageStorage is given.
- Record descriptor model name in request_usage (visionFallthroughModel column) - Save child descriptor request as its own usage record so it appears in Logs UI - Display fallthrough model in Logs.tsx Model column with copy button - Add tests that would have caught both bugs Closes mcowger#143
312b3ea to
5c98c76
Compare
mcowger
left a comment
There was a problem hiding this comment.
Code Review: Energy Usage Improvements
Reviewed the full diff and found several issues that should be addressed before merge. Inline comments below with details.
Blockers
- #1:
GH100in config enum and UI dropdown but not inGPU_PRESETS— silent fallback to H100 - #2:
recalculateEnergyForAliasloads all requests into memory with no pagination - #4: Duplicate recalculation code in PUT and PATCH handlers
Significant
- #9:
DTYPE_SIZESduplicated acrosshuggingface-model-fetcher.tsandusage-storage.ts - #10:
DEFAULT_GPUuses cluster-level power (14.3kW) conflicting with per-GPU presets
Minor
- #12:
getGpuPresetOptions()exported but never used; already out of sync with UI dropdown - #14: Energy components use local formatters instead of centralized
format.ts - #15 (nit):
onKeyPressis deprecated, should useonKeyDown - #16: No test coverage for any of the new code
- #17: PG migration bundles unrelated
vision_fallthrough_modelcolumn
| useClaudeMasking: z.boolean().optional().default(false), | ||
| quota_checker: ProviderQuotaCheckerSchema.optional(), | ||
| // GPU Profile settings for inference energy calculation | ||
| gpu_profile: z.enum(['H100', 'H200', 'GH100', 'GH200', 'B200', 'B300', 'custom']).optional(), |
There was a problem hiding this comment.
🔴 Blocker #1: GH100 is in this enum and in the Providers UI dropdown, but there is no GH100 entry in GPU_PRESETS (in inference-energy.ts). If a user selects GH100, getGpuParams() silently falls back to the H100 default — no error, no warning.
NVIDIA doesn't have a "GH100" product — they have GH200 Grace Hopper. The UI label says "NVIDIA GH100 (144GB)" which matches GH200's specs, so this looks like a naming error.
Fix: Either add a GH100 key to GPU_PRESETS or remove it from this enum (and the Providers dropdown). Given the naming confusion, removing GH100 and keeping GH200 seems cleaner.
| provider: this.schema.requestUsage.finalAttemptProvider, | ||
| }) | ||
| .from(this.schema.requestUsage) | ||
| .where(eq(this.schema.requestUsage.incomingModelAlias, aliasSlug)); |
There was a problem hiding this comment.
🔴 Blocker #2: This query loads all historical requests for an alias into memory with no LIMIT. For busy deployments this could be millions of rows → OOM. The batchSize = 100 below only controls update concurrency, not the initial SELECT.
Fix: Add cursor-based pagination or at minimum iterate with .limit() + .offset(), e.g.:
const BATCH_SIZE = 500;
let offset = 0;
while (true) {
const batch = await db.select({...}).from(...).where(eq(...)).limit(BATCH_SIZE).offset(offset);
if (batch.length === 0) break;
// process batch...
offset += BATCH_SIZE;
}| // Recalculate energy usage if model_architecture was provided | ||
| if (result.data.model_architecture && usageStorage) { | ||
| try { | ||
| const updated = await usageStorage.recalculateEnergyForAlias( |
There was a problem hiding this comment.
🔴 Blocker #4: This recalculation block (lines 161-173) is duplicated verbatim in the PATCH handler below (lines 203-215). Please extract it into a shared helper, e.g.:
async function recalculateEnergyIfChanged(
slug: string,
model_architecture: any,
usageStorage?: UsageStorageService
) {
if (model_architecture && usageStorage) {
try {
const updated = await usageStorage.recalculateEnergyForAlias(slug, model_architecture);
logger.info(`Recalculated energy for ${updated} requests for alias '${slug}'`);
} catch (recalcError) {
logger.error(`Failed to recalculate energy for alias '${slug}'`, recalcError);
}
}
}Then call it from both PUT and PATCH handlers.
| import type { ModelParams } from './inference-energy'; | ||
|
|
||
| // Common data type sizes in bytes | ||
| export const DTYPE_SIZES: Record<string, number> = { |
There was a problem hiding this comment.
🟡 Significant #9: DTYPE_SIZES is defined here, but usage-storage.ts also has its own getDtypeSize() switch statement (line 944) with the same dtype→bytes mapping. If a new dtype is added to one, the other will be out of sync.
Fix: Make inference-energy.ts the single source of truth — export DTYPE_SIZES from there, and import it in both huggingface-model-fetcher.ts and usage-storage.ts. Remove the getDtypeSize() method from UsageStorageService.
| ram_gb: 192, | ||
| bandwidth_tb_s: 8.0, | ||
| flops_tflop: 9000, | ||
| power_draw_watts: 14300, |
There was a problem hiding this comment.
🟡 Significant #10: DEFAULT_GPU.power_draw_watts: 14300 looks like an 8-GPU H100 cluster (8×700W + overhead), but all the presets define per-GPU power (700W, 1000W, 1400W). The power scaling formula on line 201 does power_draw_watts * (tp / 8), which assumes per-GPU power scaled by TP.
With DEFAULT_GPU, this becomes 14300 * (8/8) = 14300W — accidentally correct for an 8-way cluster, but semantically wrong since the field is called power_draw_watts and documented as a single GPU spec.
Meanwhile getGpuParams() defaults to a single H100 (700W). These two defaults disagree by ~20×.
Fix: Align DEFAULT_GPU with the presets — make it a single-GPU spec (e.g., matching B200 at 1000W) or clearly document it as a cluster-level default with matching comment + field name.
There was a problem hiding this comment.
Yeah I completely forgot whether I was using cluster or single GPU power in the interem since last touching this math, hence the total inconsistency.
| /** | ||
| * Returns available GPU preset options for UI dropdowns | ||
| */ | ||
| export function getGpuPresetOptions(): Array<{ value: string; label: string }> { |
There was a problem hiding this comment.
🟢 Minor #12: getGpuPresetOptions() is exported but never called anywhere. The Providers page (Providers.tsx) hardcodes its own dropdown options independently. These two lists are already out of sync (this function lacks GH100, and the UI has it).
Fix: Either wire up this function in the Providers dropdown to keep a single source of truth, or remove it if the UI prefers hardcoded options.
| /** | ||
| * Formats kWh values for tooltip display. | ||
| */ | ||
| function formatKwh(value: number): string { |
There was a problem hiding this comment.
🟢 Minor #14: formatKwh() and formatTimeLabel() are local formatters. Per project guidelines (AGENTS.md §8.3), all formatting should be centralized in packages/frontend/src/lib/format.ts. The same applies to formatStreamingTime() in EnergyTimeComparison.tsx.
Fix: Add formatKwh() and formatStreamingTime() (or reuse formatDuration) to format.ts, then import and use them from both components.
| value={hfModelId} | ||
| onChange={(e) => setHfModelId(e.target.value)} | ||
| placeholder="e.g. meta-llama/Llama-3.1-70B-Instruct" | ||
| onKeyPress={(e) => e.key === 'Enter' && fetchHfModelArchitecture()} |
There was a problem hiding this comment.
🟢 Nit #15: onKeyPress is deprecated in React. Please use onKeyDown instead:
onKeyDown={(e) => e.key === 'Enter' && fetchHfModelArchitecture()}Same applies to line 1438 in this file.
| export class HuggingFaceModelFetcher { | ||
| private static instance: HuggingFaceModelFetcher; | ||
| private cache: Map<string, ModelParams> = new Map(); | ||
| private dtypeCache: Map<string, string> = new Map(); |
There was a problem hiding this comment.
🟢 Minor #16 (no tests): There is zero test coverage for the new code in this PR:
HuggingFaceModelFetcher(complex parsing/matching logic — would benefit from unit tests forparseConfig,getHeuristicParams,inferDtype)recalculateEnergyForAliasinusage-storage.ts- The new
/v0/management/models/huggingface/:modelIdendpoint EnergyOverTimeandEnergyTimeComparisonfrontend components
Please add at least basic unit tests for the HuggingFace fetcher (especially parseConfig and getHeuristicParams) and the energy recalculation logic.
| @@ -0,0 +1,7 @@ | |||
| ALTER TABLE "request_usage" ADD COLUMN "vision_fallthrough_model" text;--> statement-breakpoint | |||
There was a problem hiding this comment.
🟢 Minor #17: This PG migration includes ALTER TABLE "request_usage" ADD COLUMN "vision_fallthrough_model" text; which is from a different feature (the vision fallthrough work already merged to main). The SQLite equivalent (migration 0025) was already applied on main.
Fix: Please regenerate the PG migration so it only contains the new GPU/model_architecture columns, without the vision_fallthrough_model addition. You can do this by:
- Resetting the PG migration files
- Ensuring your branch is based on latest main (which already has the
vision_fallthrough_modelcolumn in the schema) - Running
bunx drizzle-kit generate --config drizzle.config.pg.tsto produce a clean migration with only the new columns
|
Goddamn! This is what I get for vibe coding more than usual. "It'll be quick" I thought. "How bad can it mess up" I thought. 😂 |