Improve calculations and display for energy usage. by alexispurslane · Pull Request #156 · mcowger/plexus

alexispurslane · 2026-04-11T16:12:49Z

Revised the inference energy estimator to accept the model architecture and gpu as arguments.
Added an accordion form to the model alias modal that lets the user:
- fetch model architecture, size, and default dtype information from HuggingFace (safetensors + config.json info) and estimates active parameters and total parameters from that
- edit those values (dtype gets a dropdown)
- save those values
- fetch those values when previously stored and edit them again
Added a dropdown to a provider to choose the default type of GPU they have. Also saved
Recalculates the energy usage of all past requests when a model's architecture is updated.
Shows the amount of time in seconds you could've streamed Netflix for the same amount of energy as your prompts for the give hour/day/week/month on the Usage page
Also compares energy usage rate between your model usage and Netflix
Also added an energy usage over time time series graph to the usage page.

feat: Support the new Synthetic quota system

Adds a new `vision_fallthrough_model` column to `request_usage` (both SQLite and PostgreSQL) to capture which descriptor model was used when `use_image_fallthrough` is triggered. The model name is now stored through all success/failure attempt paths so users can filter and analyze fallthrough performance per model in dashboards and logs. Closes mcowger#143

- Add visionFallthroughModel to the UsageRecord type in api.ts - Show fallthrough model name as a third line in the Model column (ScanSearch icon + model name, with copy button on hover) - Update the ScanSearch icon tooltip to include the model name Closes mcowger#143

…uests Previously the child request to the image description model was dispatched internally and never logged. Now VisionDescriptorService saves a full usage record (tokens, cost, duration, provider, model) after each descriptor dispatch, marked with isDescriptorRequest=true so it appears in the Logs UI alongside the parent request. Closes mcowger#143

Two test suites that would have caught the bugs fixed in prior commits: 1. Dispatcher: verifies visionFallthroughModel is included in the recordAttemptMetric metadata when image fallthrough triggers, and absent when it does not. 2. VisionDescriptorService: verifies that usageStorage.saveRequest is called for the child descriptor request on success (with correct provider, model, and token counts), on dispatch failure (with responseStatus='error'), and is skipped when no usageStorage is given.

- Record descriptor model name in request_usage (visionFallthroughModel column) - Save child descriptor request as its own usage record so it appears in Logs UI - Display fallthrough model in Logs.tsx Model column with copy button - Add tests that would have caught both bugs Closes mcowger#143

mcowger

Code Review: Energy Usage Improvements

Reviewed the full diff and found several issues that should be addressed before merge. Inline comments below with details.

Blockers

#1: GH100 in config enum and UI dropdown but not in GPU_PRESETS — silent fallback to H100
#2: recalculateEnergyForAlias loads all requests into memory with no pagination
#4: Duplicate recalculation code in PUT and PATCH handlers

Significant

#9: DTYPE_SIZES duplicated across huggingface-model-fetcher.ts and usage-storage.ts
#10: DEFAULT_GPU uses cluster-level power (14.3kW) conflicting with per-GPU presets

Minor

#12: getGpuPresetOptions() exported but never used; already out of sync with UI dropdown
#14: Energy components use local formatters instead of centralized format.ts
#15 (nit): onKeyPress is deprecated, should use onKeyDown
#16: No test coverage for any of the new code
#17: PG migration bundles unrelated vision_fallthrough_model column

mcowger · 2026-04-12T06:19:07Z

packages/backend/src/config.ts

    useClaudeMasking: z.boolean().optional().default(false),
    quota_checker: ProviderQuotaCheckerSchema.optional(),
+    // GPU Profile settings for inference energy calculation
+    gpu_profile: z.enum(['H100', 'H200', 'GH100', 'GH200', 'B200', 'B300', 'custom']).optional(),


🔴 Blocker #1: GH100 is in this enum and in the Providers UI dropdown, but there is no GH100 entry in GPU_PRESETS (in inference-energy.ts). If a user selects GH100, getGpuParams() silently falls back to the H100 default — no error, no warning.

NVIDIA doesn't have a "GH100" product — they have GH200 Grace Hopper. The UI label says "NVIDIA GH100 (144GB)" which matches GH200's specs, so this looks like a naming error.

Fix: Either add a GH100 key to GPU_PRESETS or remove it from this enum (and the Providers dropdown). Given the naming confusion, removing GH100 and keeping GH200 seems cleaner.

mcowger · 2026-04-12T06:19:07Z

packages/backend/src/services/usage-storage.ts

+          provider: this.schema.requestUsage.finalAttemptProvider,
+        })
+        .from(this.schema.requestUsage)
+        .where(eq(this.schema.requestUsage.incomingModelAlias, aliasSlug));


🔴 Blocker #2: This query loads all historical requests for an alias into memory with no LIMIT. For busy deployments this could be millions of rows → OOM. The batchSize = 100 below only controls update concurrency, not the initial SELECT.

Fix: Add cursor-based pagination or at minimum iterate with .limit() + .offset(), e.g.:

const BATCH_SIZE = 500; let offset = 0; while (true) { const batch = await db.select({...}).from(...).where(eq(...)).limit(BATCH_SIZE).offset(offset); if (batch.length === 0) break; // process batch... offset += BATCH_SIZE; }

mcowger · 2026-04-12T06:19:07Z

packages/backend/src/routes/management/config.ts

+      // Recalculate energy usage if model_architecture was provided
+      if (result.data.model_architecture && usageStorage) {
+        try {
+          const updated = await usageStorage.recalculateEnergyForAlias(


🔴 Blocker #4: This recalculation block (lines 161-173) is duplicated verbatim in the PATCH handler below (lines 203-215). Please extract it into a shared helper, e.g.:

async function recalculateEnergyIfChanged( slug: string, model_architecture: any, usageStorage?: UsageStorageService ) { if (model_architecture && usageStorage) { try { const updated = await usageStorage.recalculateEnergyForAlias(slug, model_architecture); logger.info(`Recalculated energy for ${updated} requests for alias '${slug}'`); } catch (recalcError) { logger.error(`Failed to recalculate energy for alias '${slug}'`, recalcError); } } }

Then call it from both PUT and PATCH handlers.

mcowger · 2026-04-12T06:19:07Z

packages/backend/src/services/huggingface-model-fetcher.ts

+import type { ModelParams } from './inference-energy';
+
+// Common data type sizes in bytes
+export const DTYPE_SIZES: Record<string, number> = {


🟡 Significant #9: DTYPE_SIZES is defined here, but usage-storage.ts also has its own getDtypeSize() switch statement (line 944) with the same dtype→bytes mapping. If a new dtype is added to one, the other will be out of sync.

Fix: Make inference-energy.ts the single source of truth — export DTYPE_SIZES from there, and import it in both huggingface-model-fetcher.ts and usage-storage.ts. Remove the getDtypeSize() method from UsageStorageService.

mcowger · 2026-04-12T06:19:07Z

packages/backend/src/services/inference-energy.ts

+  ram_gb: 192,
+  bandwidth_tb_s: 8.0,
+  flops_tflop: 9000,
+  power_draw_watts: 14300,


🟡 Significant #10: DEFAULT_GPU.power_draw_watts: 14300 looks like an 8-GPU H100 cluster (8×700W + overhead), but all the presets define per-GPU power (700W, 1000W, 1400W). The power scaling formula on line 201 does power_draw_watts * (tp / 8), which assumes per-GPU power scaled by TP.

With DEFAULT_GPU, this becomes 14300 * (8/8) = 14300W — accidentally correct for an 8-way cluster, but semantically wrong since the field is called power_draw_watts and documented as a single GPU spec.

Meanwhile getGpuParams() defaults to a single H100 (700W). These two defaults disagree by ~20×.

Fix: Align DEFAULT_GPU with the presets — make it a single-GPU spec (e.g., matching B200 at 1000W) or clearly document it as a cluster-level default with matching comment + field name.

Yeah I completely forgot whether I was using cluster or single GPU power in the interem since last touching this math, hence the total inconsistency.

mcowger · 2026-04-12T06:19:07Z

packages/backend/src/services/inference-energy.ts

+/**
+ * Returns available GPU preset options for UI dropdowns
+ */
+export function getGpuPresetOptions(): Array<{ value: string; label: string }> {


🟢 Minor #12: getGpuPresetOptions() is exported but never called anywhere. The Providers page (Providers.tsx) hardcodes its own dropdown options independently. These two lists are already out of sync (this function lacks GH100, and the UI has it).

Fix: Either wire up this function in the Providers dropdown to keep a single source of truth, or remove it if the UI prefers hardcoded options.

mcowger · 2026-04-12T06:19:07Z

packages/frontend/src/components/EnergyOverTime.tsx

+/**
+ * Formats kWh values for tooltip display.
+ */
+function formatKwh(value: number): string {


🟢 Minor #14: formatKwh() and formatTimeLabel() are local formatters. Per project guidelines (AGENTS.md §8.3), all formatting should be centralized in packages/frontend/src/lib/format.ts. The same applies to formatStreamingTime() in EnergyTimeComparison.tsx.

Fix: Add formatKwh() and formatStreamingTime() (or reuse formatDuration) to format.ts, then import and use them from both components.

mcowger · 2026-04-12T06:19:07Z

packages/frontend/src/pages/Models.tsx

+                  value={hfModelId}
+                  onChange={(e) => setHfModelId(e.target.value)}
+                  placeholder="e.g. meta-llama/Llama-3.1-70B-Instruct"
+                  onKeyPress={(e) => e.key === 'Enter' && fetchHfModelArchitecture()}


🟢 Nit #15: onKeyPress is deprecated in React. Please use onKeyDown instead:

onKeyDown={(e) => e.key === 'Enter' && fetchHfModelArchitecture()}

Same applies to line 1438 in this file.

mcowger · 2026-04-12T06:19:07Z

packages/backend/src/services/huggingface-model-fetcher.ts

+export class HuggingFaceModelFetcher {
+  private static instance: HuggingFaceModelFetcher;
+  private cache: Map<string, ModelParams> = new Map();
+  private dtypeCache: Map<string, string> = new Map();


🟢 Minor #16 (no tests): There is zero test coverage for the new code in this PR:

HuggingFaceModelFetcher (complex parsing/matching logic — would benefit from unit tests for parseConfig, getHeuristicParams, inferDtype)

recalculateEnergyForAlias in usage-storage.ts

The new /v0/management/models/huggingface/:modelId endpoint

EnergyOverTime and EnergyTimeComparison frontend components

Please add at least basic unit tests for the HuggingFace fetcher (especially parseConfig and getHeuristicParams) and the energy recalculation logic.

mcowger · 2026-04-12T06:19:07Z

packages/backend/drizzle/migrations_pg/0029_cloudy_speedball.sql

@@ -0,0 +1,7 @@
+ALTER TABLE "request_usage" ADD COLUMN "vision_fallthrough_model" text;--> statement-breakpoint


🟢 Minor #17: This PG migration includes ALTER TABLE "request_usage" ADD COLUMN "vision_fallthrough_model" text; which is from a different feature (the vision fallthrough work already merged to main). The SQLite equivalent (migration 0025) was already applied on main.

Fix: Please regenerate the PG migration so it only contains the new GPU/model_architecture columns, without the vision_fallthrough_model addition. You can do this by:

Resetting the PG migration files

Ensuring your branch is based on latest main (which already has the vision_fallthrough_model column in the schema)

Running bunx drizzle-kit generate --config drizzle.config.pg.ts to produce a clean migration with only the new columns

alexispurslane · 2026-04-12T11:49:46Z

Goddamn! This is what I get for vibe coding more than usual.

"It'll be quick" I thought. "How bad can it mess up" I thought. 😂

mcowger and others added 13 commits April 10, 2026 20:18

Merge pull request mcowger#154 from fredizzimo/new-synthetic-quota

92d7c19

feat: Support the new Synthetic quota system

chore: release v0.19.10

b29f1e2

Fix CI tests

c0ecc7e

remove GEMINI file

1495080

ci: update GitHub Actions for Node 24

312b3ea

Improve calculations and display for energy usage.

4504445

Better model situation

63d1a70

Probably more accurate active params.

6c1761b

github-actions bot force-pushed the main branch from 312b3ea to 5c98c76 Compare April 12, 2026 03:06

mcowger requested changes Apr 12, 2026

View reviewed changes

		@@ -0,0 +1,7 @@
		ALTER TABLE "request_usage" ADD COLUMN "vision_fallthrough_model" text;--> statement-breakpoint

Conversation

alexispurslane commented Apr 11, 2026

Uh oh!

mcowger left a comment

Choose a reason for hiding this comment

Code Review: Energy Usage Improvements

Blockers

Significant

Minor

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexispurslane commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexispurslane commented Apr 12, 2026 •

edited

Loading