chat/send: surface 502 REPLY_FAILED when VA is missing api_key/provider/model#190
Merged
Merged
Conversation
…er/model
handleDirectChat returned null when the target VA had no api_key,
provider, or model. wait=true callers (the Salem engine) saw
HTTP 200 with reply: null and could not distinguish a misconfigured
VA from one that simply chose not to respond. Visitors with the
salem-visitor template (created without an api_key) froze on
arrival at the tavern and sat there until visitor_expires_at; the
engine logged 'chat response missing reply' on every tick but
emitted no error code, so the failure was invisible to operators
without grepping the engine journal.
Throw instead. The chat/send route's existing wait-mode catch
(routes/chat.js:159) maps thrown errors to 502 REPLY_FAILED with
the message text — the same shape the encrypted-key failure path
already produces. Non-wait chatSend callers in services/chat.js
already swallow rejections via .catch(() => {}).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom
A
salem-visitorVA was created inagent_configurationwithprovider=anthropicandmodel=claude-haiku-4-5but noapi_key. The Salem engine spawned visitors normally — they walked from a village-edge tile to the tavern and entered an outdoor huddle — but every subsequent agent tick failed silently:```
agent-tick Caleb Wendell the wool-buyer iter=0: chat response missing reply
(body: {"from_agent":"salem-engine","to_agents":[{...,"agent":"salem-visitor",...}],"reply":null})
```
Result: visitors froze at (1488, 656) outside the tavern for hours until `visitor_expires_at` cleared them. No error code surfaced to the engine, no admin signal, no retry — just `reply: null` on HTTP 200.
Cause
`handleDirectChat` in `node/api/src/services/virtual-agent.js` returned `null` when the target VA was missing `api_key`, `provider`, or `model`. The chat/send wait-mode path (`routes/chat.js:148-158`) awaits `pendingReplyPromise`, gets `null`, and returns `{ reply: null }` with HTTP 200. Engine sees a successful response with no payload and gives up — same shape it would see if the VA legitimately chose not to reply.
Compare with the existing `decryptApiKey()` failure path: a malformed encrypted key throws, the wait-mode `catch (replyErr)` block at `routes/chat.js:159` maps it to `502 REPLY_FAILED { code, message }`, and the engine logs the failure visibly.
Fix
Throw instead of returning null on missing config. The existing wait-mode catch already produces the right error shape — no route changes needed.
Blast radius
Test plan
— Home