Skip to content

Improve calculations and display for energy usage.#156

Open
alexispurslane wants to merge 13 commits intomcowger:mainfrom
alexispurslane:main
Open

Improve calculations and display for energy usage.#156
alexispurslane wants to merge 13 commits intomcowger:mainfrom
alexispurslane:main

Conversation

@alexispurslane
Copy link
Copy Markdown

  • Revised the inference energy estimator to accept the model architecture and gpu as arguments.
  • Added an accordion form to the model alias modal that lets the user:
    • fetch model architecture, size, and default dtype information from HuggingFace (safetensors + config.json info) and estimates active parameters and total parameters from that
    • edit those values (dtype gets a dropdown)
    • save those values
    • fetch those values when previously stored and edit them again
  • Added a dropdown to a provider to choose the default type of GPU they have. Also saved
  • Recalculates the energy usage of all past requests when a model's architecture is updated.
  • Shows the amount of time in seconds you could've streamed Netflix for the same amount of energy as your prompts for the give hour/day/week/month on the Usage page
  • Also compares energy usage rate between your model usage and Netflix
  • Also added an energy usage over time time series graph to the usage page.

mcowger and others added 13 commits April 10, 2026 20:18
feat: Support the new Synthetic quota system
Adds a new `vision_fallthrough_model` column to `request_usage` (both
SQLite and PostgreSQL) to capture which descriptor model was used when
`use_image_fallthrough` is triggered. The model name is now stored
through all success/failure attempt paths so users can filter and
analyze fallthrough performance per model in dashboards and logs.

Closes mcowger#143
- Add visionFallthroughModel to the UsageRecord type in api.ts
- Show fallthrough model name as a third line in the Model column
  (ScanSearch icon + model name, with copy button on hover)
- Update the ScanSearch icon tooltip to include the model name

Closes mcowger#143
…uests

Previously the child request to the image description model was dispatched
internally and never logged. Now VisionDescriptorService saves a full
usage record (tokens, cost, duration, provider, model) after each
descriptor dispatch, marked with isDescriptorRequest=true so it appears
in the Logs UI alongside the parent request.

Closes mcowger#143
Two test suites that would have caught the bugs fixed in prior commits:

1. Dispatcher: verifies visionFallthroughModel is included in the
   recordAttemptMetric metadata when image fallthrough triggers, and
   absent when it does not.

2. VisionDescriptorService: verifies that usageStorage.saveRequest is
   called for the child descriptor request on success (with correct
   provider, model, and token counts), on dispatch failure (with
   responseStatus='error'), and is skipped when no usageStorage is given.
- Record descriptor model name in request_usage (visionFallthroughModel column)
- Save child descriptor request as its own usage record so it appears in Logs UI
- Display fallthrough model in Logs.tsx Model column with copy button
- Add tests that would have caught both bugs

Closes mcowger#143
Copy link
Copy Markdown
Owner

@mcowger mcowger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Energy Usage Improvements

Reviewed the full diff and found several issues that should be addressed before merge. Inline comments below with details.

Blockers

  • #1: GH100 in config enum and UI dropdown but not in GPU_PRESETS — silent fallback to H100
  • #2: recalculateEnergyForAlias loads all requests into memory with no pagination
  • #4: Duplicate recalculation code in PUT and PATCH handlers

Significant

  • #9: DTYPE_SIZES duplicated across huggingface-model-fetcher.ts and usage-storage.ts
  • #10: DEFAULT_GPU uses cluster-level power (14.3kW) conflicting with per-GPU presets

Minor

  • #12: getGpuPresetOptions() exported but never used; already out of sync with UI dropdown
  • #14: Energy components use local formatters instead of centralized format.ts
  • #15 (nit): onKeyPress is deprecated, should use onKeyDown
  • #16: No test coverage for any of the new code
  • #17: PG migration bundles unrelated vision_fallthrough_model column

useClaudeMasking: z.boolean().optional().default(false),
quota_checker: ProviderQuotaCheckerSchema.optional(),
// GPU Profile settings for inference energy calculation
gpu_profile: z.enum(['H100', 'H200', 'GH100', 'GH200', 'B200', 'B300', 'custom']).optional(),
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocker #1: GH100 is in this enum and in the Providers UI dropdown, but there is no GH100 entry in GPU_PRESETS (in inference-energy.ts). If a user selects GH100, getGpuParams() silently falls back to the H100 default — no error, no warning.

NVIDIA doesn't have a "GH100" product — they have GH200 Grace Hopper. The UI label says "NVIDIA GH100 (144GB)" which matches GH200's specs, so this looks like a naming error.

Fix: Either add a GH100 key to GPU_PRESETS or remove it from this enum (and the Providers dropdown). Given the naming confusion, removing GH100 and keeping GH200 seems cleaner.

provider: this.schema.requestUsage.finalAttemptProvider,
})
.from(this.schema.requestUsage)
.where(eq(this.schema.requestUsage.incomingModelAlias, aliasSlug));
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocker #2: This query loads all historical requests for an alias into memory with no LIMIT. For busy deployments this could be millions of rows → OOM. The batchSize = 100 below only controls update concurrency, not the initial SELECT.

Fix: Add cursor-based pagination or at minimum iterate with .limit() + .offset(), e.g.:

const BATCH_SIZE = 500;
let offset = 0;
while (true) {
  const batch = await db.select({...}).from(...).where(eq(...)).limit(BATCH_SIZE).offset(offset);
  if (batch.length === 0) break;
  // process batch...
  offset += BATCH_SIZE;
}

// Recalculate energy usage if model_architecture was provided
if (result.data.model_architecture && usageStorage) {
try {
const updated = await usageStorage.recalculateEnergyForAlias(
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocker #4: This recalculation block (lines 161-173) is duplicated verbatim in the PATCH handler below (lines 203-215). Please extract it into a shared helper, e.g.:

async function recalculateEnergyIfChanged(
  slug: string,
  model_architecture: any,
  usageStorage?: UsageStorageService
) {
  if (model_architecture && usageStorage) {
    try {
      const updated = await usageStorage.recalculateEnergyForAlias(slug, model_architecture);
      logger.info(`Recalculated energy for ${updated} requests for alias '${slug}'`);
    } catch (recalcError) {
      logger.error(`Failed to recalculate energy for alias '${slug}'`, recalcError);
    }
  }
}

Then call it from both PUT and PATCH handlers.

import type { ModelParams } from './inference-energy';

// Common data type sizes in bytes
export const DTYPE_SIZES: Record<string, number> = {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Significant #9: DTYPE_SIZES is defined here, but usage-storage.ts also has its own getDtypeSize() switch statement (line 944) with the same dtype→bytes mapping. If a new dtype is added to one, the other will be out of sync.

Fix: Make inference-energy.ts the single source of truth — export DTYPE_SIZES from there, and import it in both huggingface-model-fetcher.ts and usage-storage.ts. Remove the getDtypeSize() method from UsageStorageService.

ram_gb: 192,
bandwidth_tb_s: 8.0,
flops_tflop: 9000,
power_draw_watts: 14300,
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Significant #10: DEFAULT_GPU.power_draw_watts: 14300 looks like an 8-GPU H100 cluster (8×700W + overhead), but all the presets define per-GPU power (700W, 1000W, 1400W). The power scaling formula on line 201 does power_draw_watts * (tp / 8), which assumes per-GPU power scaled by TP.

With DEFAULT_GPU, this becomes 14300 * (8/8) = 14300W — accidentally correct for an 8-way cluster, but semantically wrong since the field is called power_draw_watts and documented as a single GPU spec.

Meanwhile getGpuParams() defaults to a single H100 (700W). These two defaults disagree by ~20×.

Fix: Align DEFAULT_GPU with the presets — make it a single-GPU spec (e.g., matching B200 at 1000W) or clearly document it as a cluster-level default with matching comment + field name.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I completely forgot whether I was using cluster or single GPU power in the interem since last touching this math, hence the total inconsistency.

/**
* Returns available GPU preset options for UI dropdowns
*/
export function getGpuPresetOptions(): Array<{ value: string; label: string }> {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Minor #12: getGpuPresetOptions() is exported but never called anywhere. The Providers page (Providers.tsx) hardcodes its own dropdown options independently. These two lists are already out of sync (this function lacks GH100, and the UI has it).

Fix: Either wire up this function in the Providers dropdown to keep a single source of truth, or remove it if the UI prefers hardcoded options.

/**
* Formats kWh values for tooltip display.
*/
function formatKwh(value: number): string {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Minor #14: formatKwh() and formatTimeLabel() are local formatters. Per project guidelines (AGENTS.md §8.3), all formatting should be centralized in packages/frontend/src/lib/format.ts. The same applies to formatStreamingTime() in EnergyTimeComparison.tsx.

Fix: Add formatKwh() and formatStreamingTime() (or reuse formatDuration) to format.ts, then import and use them from both components.

value={hfModelId}
onChange={(e) => setHfModelId(e.target.value)}
placeholder="e.g. meta-llama/Llama-3.1-70B-Instruct"
onKeyPress={(e) => e.key === 'Enter' && fetchHfModelArchitecture()}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit #15: onKeyPress is deprecated in React. Please use onKeyDown instead:

onKeyDown={(e) => e.key === 'Enter' && fetchHfModelArchitecture()}

Same applies to line 1438 in this file.

export class HuggingFaceModelFetcher {
private static instance: HuggingFaceModelFetcher;
private cache: Map<string, ModelParams> = new Map();
private dtypeCache: Map<string, string> = new Map();
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Minor #16 (no tests): There is zero test coverage for the new code in this PR:

  • HuggingFaceModelFetcher (complex parsing/matching logic — would benefit from unit tests for parseConfig, getHeuristicParams, inferDtype)
  • recalculateEnergyForAlias in usage-storage.ts
  • The new /v0/management/models/huggingface/:modelId endpoint
  • EnergyOverTime and EnergyTimeComparison frontend components

Please add at least basic unit tests for the HuggingFace fetcher (especially parseConfig and getHeuristicParams) and the energy recalculation logic.

@@ -0,0 +1,7 @@
ALTER TABLE "request_usage" ADD COLUMN "vision_fallthrough_model" text;--> statement-breakpoint
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Minor #17: This PG migration includes ALTER TABLE "request_usage" ADD COLUMN "vision_fallthrough_model" text; which is from a different feature (the vision fallthrough work already merged to main). The SQLite equivalent (migration 0025) was already applied on main.

Fix: Please regenerate the PG migration so it only contains the new GPU/model_architecture columns, without the vision_fallthrough_model addition. You can do this by:

  1. Resetting the PG migration files
  2. Ensuring your branch is based on latest main (which already has the vision_fallthrough_model column in the schema)
  3. Running bunx drizzle-kit generate --config drizzle.config.pg.ts to produce a clean migration with only the new columns

@alexispurslane
Copy link
Copy Markdown
Author

alexispurslane commented Apr 12, 2026

Goddamn! This is what I get for vibe coding more than usual.

"It'll be quick" I thought. "How bad can it mess up" I thought. 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants