feat: Ollama Cloud model picker dropdown + minimax-m3 backend validation#166
feat: Ollama Cloud model picker dropdown + minimax-m3 backend validation#166ankitvgupta wants to merge 4 commits into
Conversation
Validate the new minimax-m3:cloud model end-to-end through both agent harnesses and characterize its quality/speed on Ollama Cloud: - test-minimax-claude-sdk.mjs Claude Agent SDK + Ollama routing smoke (tool call) - test-minimax-opencode.mjs OpenCode + Ollama routing smoke (tool call) - bench-minimax.mjs single-shot quality + TTFT/throughput, both transports - agentic-minimax.mjs multi-step tool-chaining + cross-model speed comparison Each mirrors the exact routing the providers build (env remap for the Claude SDK path; ollama-cloud provider registration for OpenCode). Not wired into CI — they require OLLAMA_API_KEY and hit real Ollama Cloud. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Switching open-source models required hand-typing model ids in two places. Replace both free-text inputs — Extensions → Ollama Cloud "Default Model", and the per-feature override in General → feature routing — with a curated dropdown of common Ollama Cloud models, defaulting to Kimi K2.6. A "Custom…" option reveals a text field for any other model id, so new models (e.g. minimax-m3) stay reachable without maintaining an allowlist. When a saved value isn't in the curated list, the field opens in custom mode showing what's stored, so existing hand-typed configs don't break. - COMMON_OLLAMA_MODELS curated list in shared/types.ts (verified-live :cloud ids) - reusable OllamaModelSelect component (dropdown + custom escape hatch) - wired into ExtensionsTab (default model) and SettingsPanel (per-feature) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR adds an
Confidence Score: 5/5Safe to merge — the UI changes are well-scoped and all three previously flagged issues have been addressed. The OllamaModelSelect component correctly handles all three initial states (curated value, unknown non-empty value, empty string), the useEffect properly resets custom mode when the parent supplies a curated value, and the three previously flagged issues are all resolved. The only finding is a port leak in dev-only benchmark scripts on error paths, which has no impact on production code or the test suite. The agentic benchmark and opencode smoke-test scripts have cleanup-on-error gaps, but these are developer-run scripts with no CI or production impact.
|
| Filename | Overview |
|---|---|
| src/shared/types.ts | Adds COMMON_OLLAMA_MODELS readonly array with 7 curated :cloud model ids; well-typed and clearly documented as a convenience list, not an allowlist. |
| src/renderer/components/OllamaModelSelect.tsx | New reusable dropdown with Custom… escape hatch; correctly handles pre-existing hand-typed values on mount and uses a useEffect to sync back to dropdown when value is externally reset to a curated model. |
| src/renderer/components/ExtensionsTab.tsx | Replaces the free-text Default Model input with OllamaModelSelect; onChange signature correctly adapted from event-based to value-based. |
| src/renderer/components/SettingsPanel.tsx | Replaces per-feature Ollama model text inputs with OllamaModelSelect; value fallback to DEFAULT_OLLAMA_MODEL preserved via ?? operator on the call site. |
| scripts/lib/load-env.mjs | Shared minimal .env loader extracted from the four script files; handles double-quoted values and skips keys already in process.env. |
| scripts/agentic-minimax.mjs | Multi-step agentic harness test; search_emails now uses proper term-based filtering. runOpencode starts httpServer before createOpencodeServer, so a failure in server setup leaves the HTTP bridge port open. |
| scripts/bench-minimax.mjs | Quality + TTFT/throughput benchmark across Anthropic and OpenAI-compat transports; median aggregation and per-trial error capture are correct. |
| scripts/test-minimax-claude-sdk.mjs | Claude Agent SDK smoke test for minimax-m3; correctly mirrors ClaudeAgentProvider's childEnv construction and MCP tool setup. |
| scripts/test-minimax-opencode.mjs | OpenCode smoke test for minimax-m3; cleanup (server.close, httpServer.close) only happens on the happy path — an unexpected throw before those lines would leave both servers open. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[OllamaModelSelect mounts] --> B{value in COMMON_OLLAMA_MODELS?}
B -- Yes --> C[custom = false, inCustomMode = false, Show dropdown curated model]
B -- No, value length > 0 --> D[custom = true, inCustomMode = true, Show dropdown CUSTOM + text input]
B -- No, value empty --> E[custom = false, inCustomMode = false, browser renders first option visually]
C --> F{User changes dropdown}
F -- Picks curated model --> G[onChange called, custom = false]
F -- Picks Custom --> H[custom = true, inCustomMode = true, Text input appears]
H --> I{User types in text input}
I -- Types known id --> J[onChange called, useEffect: isKnown=true, custom=false, Dropdown mode restored]
I -- Types novel id --> K[onChange called, custom stays true, inCustomMode stays true]
D --> F
E --> F
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
scripts/agentic-minimax.mjs:396-418
**`httpServer` not closed on `createOpencodeServer` failure**
`httpServer` is started and listening (line 397) before `createOpencodeServer` is awaited (line 400). If `createOpencodeServer` throws — e.g. a port conflict or missing binary on first install — the HTTP bridge server is never closed. Because `runOne` catches the error and continues to the next scenario, multiple leaked ports can accumulate across a full multi-model run. The same pattern exists in `test-minimax-opencode.mjs` before its cleanup block.
Wrapping the post-listen body in a `try/finally` that calls `httpServer.close()` would prevent the leak regardless of the failure point.
Reviews (3): Last reviewed commit: "address Greptile review feedback (review..." | Re-trigger Greptile
✅ Pre-PR verification — PASS
Agentic verification — summaryAgentic verification — verify-diff
SummaryThe primary user-visible change is the OllamaModelSelect component that replaces a plain text field with a curated model dropdown when 'Ollama Cloud' is selected as a provider. Verified end-to-end: switching Email Analysis to Ollama Cloud shows the 7-model dropdown (Kimi K2.6 selected by default); choosing 'Custom…' correctly reveals a text input pre-populated with the current model id; switching back to another provider is unaffected. The Extensions tab renders correctly with built-in and toggleable extensions. No JS errors observed. Agentic verification — literal trace[2026-06-01T22:32:04.134Z] text: No errors. All checks passed. The primary regression checks are complete:
{"verdict":"pass","summary":"The primary user-visible change is the OllamaModelSelect component that replaces a plain text field with a curated model dropdown when 'Ollama Cloud' is selected as a provider. Verified end-to-end: switching Email Analysis to Ollama Cloud shows the 7-model dropdown (Kimi K2.6 selected by default); choosing 'Custom…' correctly reveals a text input pre-populated with the current model id; switching back to another provider is unaffected. The Extensions tab renders correctly with built-in and toggleable extensions. No JS errors observed.","anomalies":[],"actions_taken":10} |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- OllamaModelSelect: clear custom mode when value is externally reset to a curated model id, via an effect on [value]. Previously the sticky custom flag could leave the select showing "Custom…" with a curated id in the text box. Verified live: typing/selecting a known id now collapses to the dropdown. - agentic-minimax search_emails: make the fixture genuinely query-driven (match query terms against subject/sender) instead of a hardcoded "acme" branch that ignored the query. - extract the duplicated loadEnv helper into scripts/lib/load-env.mjs and import it from all four minimax scripts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Two related pieces of work that came out of evaluating the new minimax-m3 model on Ollama Cloud:
Changes
Feature (UX):
shared/types.ts—COMMON_OLLAMA_MODELScurated list (7 verified-live:cloudids; Kimi first/default)OllamaModelSelect.tsx— reusable dropdown + custom-id escape hatchExtensionsTab.tsx— Ollama Cloud "Default Model" → dropdownSettingsPanel.tsx— per-feature Ollama override → dropdownScripts (not in CI; require
OLLAMA_API_KEY):test-minimax-claude-sdk.mjs/test-minimax-opencode.mjs— routing smoke tests (tool-calling) for each harnessbench-minimax.mjs— single-shot quality + TTFT/throughput across both transportsagentic-minimax.mjs— multi-step tool-chaining + cross-model speed comparisonWhy
The curated list is a convenience, not an allowlist — new models ship on Ollama Cloud constantly, so the "Custom…" option keeps any id reachable. When a saved value isn't in the list (e.g. a previously hand-typed model), the field opens in custom mode showing what's stored, so existing configs don't silently break.
Screenshots
UI states (captured live via CDP in demo mode; local copies in
.context/):Default Modelshowing "MiniMax M3 — newest, agentic", default is Kimi(Will attach inline — these were verified in the running app.)
Test plan
npm run typecheck✅ ·eslint✅ ·npm run test:unit→ 1425 passed ✅Pre-PR verdict: PASS
fulldab2f1e