fix(memory): health-gate Ollama embeddings, fall back to cloud (OPENHUMAN-TAURI-B7)#1555
fix(memory): health-gate Ollama embeddings, fall back to cloud (OPENHUMAN-TAURI-B7)#1555CodeGhost21 wants to merge 3 commits into
Conversation
…UMAN-TAURI-B7) When a user has opted into local AI for embeddings (`local_ai.runtime_enabled = true` + `local_ai.usage.embeddings = true`) but the Ollama daemon isn't actually running at `localhost:11434`, the previous code instantiated `OllamaEmbedding` anyway and every embed call fired a Sentry event: 226 in a single day with zero impacted users — pure noise drowning real signals. Root-cause fix: probe `<base_url>/api/tags` once at memory-factory time (2s timeout). If Ollama responds, behaviour is unchanged. If it doesn't, fall back to the cloud embedder for this session and report **one** Sentry event with low cardinality (`memory.ollama_health_gate`) so a genuine misconfiguration still surfaces while embed calls never even reach the dead daemon. The probe is driven through `tokio::task::block_in_place + Handle::current().block_on(...)` from the sync factory, which is safe under the core's multi-thread runtime and harmless in sync test contexts (where no runtime is present → probe is skipped). New public surface: - `memory::probe_ollama_reachable(base_url) -> bool` (async) - `memory::effective_embedding_settings_probed(...)` (async) 8 new unit tests cover the probe (healthy/unreachable/non-2xx) and the fallback decision (cloud-only paths untouched, opted-in healthy keeps Ollama, opted-in unreachable falls back to cloud). Fixes OPENHUMAN-TAURI-B7
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds an async Ollama /api/tags reachability probe and a probed embedding-settings resolver that falls back to cloud embeddings when Ollama is unreachable; integrates a blocking probe shim into create_memory_full, re-exports the new APIs, and adds Axum-backed tests and unit tests. ChangesOllama Health Check and Embedding Settings
Sequence Diagram(s)sequenceDiagram
participant Caller
participant effective_embedding_settings_probed
participant probe_ollama_reachable
participant OllamaAPI
participant report_ollama_health_gate_once
Caller->>effective_embedding_settings_probed: request (MemoryConfig, Option<LocalAiConfig>)
effective_embedding_settings_probed->>effective_embedding_settings_probed: resolve intended (provider, model, dims)
alt intended provider == "ollama"
effective_embedding_settings_probed->>probe_ollama_reachable: GET <base_url>/api/tags (short timeout)
probe_ollama_reachable->>OllamaAPI: HTTP GET /api/tags
alt 2xx
OllamaAPI-->>probe_ollama_reachable: 2xx
probe_ollama_reachable-->>effective_embedding_settings_probed: reachable
effective_embedding_settings_probed-->>Caller: return local embeddings
else non-2xx / timeout
OllamaAPI-->>probe_ollama_reachable: error/timeout
probe_ollama_reachable-->>effective_embedding_settings_probed: unreachable
effective_embedding_settings_probed->>report_ollama_health_gate_once: record once-per-process event
effective_embedding_settings_probed-->>Caller: return cloud fallback embeddings
end
else
effective_embedding_settings_probed-->>Caller: return intended settings
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/memory/store/factories.rs`:
- Around line 69-78: The debug logs in probe_ollama_reachable (and similar
probes) print the full `url`/`base_url` which may include secrets; replace those
direct uses with a redacted form (e.g., origin or host:port only) by adding a
small helper (e.g., redact_url or canonical_host_port) and use that helper in
the `log::debug!` calls inside `probe_ollama_reachable` match arms and the other
referenced locations (around lines 148-166 and 294-308) so that logs show a
stable, grep-friendly prefix (e.g., "[memory::factory] probe_ollama_reachable")
plus the redacted host instead of full URL or query/userinfo.
- Around line 159-167: The code currently calls
crate::core::observability::report_error unconditionally in the Ollama fallback
branches (seen in effective_embedding_settings_probed and the other branch
around lines 301-309), causing repeated "memory.ollama_health_gate" reports;
wrap those report_error calls with a single-session guard (e.g., a module-level
static AtomicBool or OnceCell like OLLAMA_HEALTH_REPORTED) and only call
report_error when compare-and-swap flips it from false→true, setting the flag
afterwards so subsequent constructions/calls skip reporting; apply the same
guarded logic to both fallback locations so the error is emitted once per
session.
- Around line 432-470: These tests mutate the process-global
OPENHUMAN_OLLAMA_BASE_URL and currently race; fix by serializing them and
restoring the prior env var: add a serial test guard (e.g., annotate both test
functions probed_settings_fall_back_to_cloud_when_ollama_unreachable and
probed_settings_keep_ollama_when_daemon_responds with a serialization attribute
such as #[serial] or use a global Mutex) and at the start of each test capture
the previous value with let prev =
std::env::var_os("OPENHUMAN_OLLAMA_BASE_URL"); then set the test value, run
effective_embedding_settings_probed / start_mock_ollama as before, and in a
finally-like cleanup restore the original with if let Some(v) = prev {
std::env::set_var("OPENHUMAN_OLLAMA_BASE_URL", v) } else {
std::env::remove_var("OPENHUMAN_OLLAMA_BASE_URL") } so the env is not clobbered
for other tests.
- Around line 254-265: probe_ollama_reachable_blocking currently calls
tokio::task::block_in_place unguarded which will panic on a current-thread
runtime; update probe_ollama_reachable_blocking to check handle.runtime_flavor()
and if it is not RuntimeFlavor::MultiThread return true (skip probe), then only
call tokio::task::block_in_place when the runtime flavor is MultiThread,
mirroring the pattern used in builder.rs and keeping the existing behavior when
skipping the probe.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 033a8d15-617f-4697-aea4-7a446856f2c0
📒 Files selected for processing (3)
src/openhuman/memory/mod.rssrc/openhuman/memory/store/factories.rssrc/openhuman/memory/store/mod.rs
- Sentry report is now fired at most once per process via `OLLAMA_HEALTH_REPORTED` atomic flag. Both fallback sites (the async `effective_embedding_settings_probed` and the sync `create_memory_full`) funnel through `report_ollama_health_gate_once` so re-instantiating memory across agents/sessions can't recreate the per-embed flood the gate exists to suppress. - `probe_ollama_reachable_blocking` now checks `runtime_flavor()` and skips the probe on a current-thread runtime — `block_in_place` panics there. Multi-thread path is unchanged. - Env-mutating tests now hold `TEST_ENV_LOCK` and restore the previous value of `OPENHUMAN_OLLAMA_BASE_URL` via an `EnvGuard` RAII helper, so they can no longer race each other or pollute sibling tests. - New test `ollama_health_gate_reports_at_most_once_per_process` asserts the one-shot contract directly. Url-redaction finding skipped: Ollama base URLs are local user- configured infrastructure (Ollama has no auth model). Stripping host/path would harm diagnosis on the `ollama_health_gate` event, which is the whole point of keeping that signal in Sentry. Refs OPENHUMAN-TAURI-B7
graycyrus
left a comment
There was a problem hiding this comment.
Review — fix(memory): health-gate Ollama embeddings (OPENHUMAN-TAURI-B7)
Approach is sound — probing once at factory time and falling back to cloud is the right fix for the 226-event Sentry flood. The AtomicBool latch, RuntimeFlavor guard, and EnvGuard RAII are all well-executed. A few items worth addressing below.
Verified / looks good
AtomicBoolordering (AcqRel/Acquire) is correct for a once-per-process latchRuntimeFlavor::MultiThreadcheck beforeblock_in_place— matches Tokio docs- 2s timeout for a localhost probe — reasonable
EnvGuard+TEST_ENV_LOCK— proper test isolation- 8 new tests cover probe (healthy/unreachable/non-2xx), fallback paths, once-per-process
- Module layout correct — no standalone files at
src/openhuman/root - No hardcoded PII in test fixtures
| } | ||
| } | ||
| "http://localhost:11434".to_string() | ||
| } |
There was a problem hiding this comment.
[major] ollama_base_url_for_probe duplicates local_ai::ollama_base_url verbatim
This is a line-for-line copy of crate::openhuman::local_ai::ollama_base_url() which already lives in local_ai/ollama_api.rs and is pub(crate). Two implementations of the same env-var lookup will diverge — if someone adds a third env var override or changes precedence in ollama_api.rs, the memory probe will silently use a different URL than every other Ollama call in the process.
The comment on line 62–65 acknowledges this ("keeping the lookup logic in sync") but that sync guarantee is manual and fragile.
| } | |
| fn ollama_base_url_for_probe() -> String { | |
| // Delegate to the canonical implementation so the probe always | |
| // agrees with the rest of the Ollama machinery on the daemon address. | |
| crate::openhuman::local_ai::ollama_base_url() | |
| } |
| message.as_str(), | ||
| "memory", | ||
| "ollama_health_gate", | ||
| &[("base_url", base_url), ("fallback", "cloud")], |
There was a problem hiding this comment.
[major] base_url sent as a Sentry tag — high-cardinality, may contain credentials
The full base_url is passed as a Sentry indexed tag via ("base_url", base_url). If the URL includes userinfo (http://user:secret@host:11434), it gets indexed in Sentry. Even without credentials, full IP/hostname makes the tag high-cardinality, preventing Sentry from grouping across users.
CodeRabbit flagged the log-line variant of this — the Sentry tag is a second, distinct instance. The full URL is fine in the error message body for diagnostics; the tag should be a low-cardinality redacted form (e.g. host:port only):
let host_tag = base_url
.trim_start_matches("https://")
.trim_start_matches("http://")
.split(['/', '@', '?', '#'])
.next()
.unwrap_or("unknown");
crate::core::observability::report_error(
message.as_str(),
"memory",
"ollama_health_gate",
&[("ollama_host", host_tag), ("fallback", "cloud")],
);| } | ||
| } else { | ||
| intended | ||
| }; |
There was a problem hiding this comment.
[major] Probe+fallback logic duplicated between effective_embedding_settings_probed and create_memory_full
These two paths are supposed to behave identically but they diverge:
effective_embedding_settings_probed(async, line 172) always probes.create_memory_full(sync, here) usesprobe_ollama_reachable_blockingwhich silently skips the probe and returnstrueon current-thread runtimes — so the health-gate is bypassed entirely for those callers and the Sentry flood can resurface.
Also, the fallback tuple ("cloud", DEFAULT_CLOUD_EMBEDDING_MODEL, DEFAULT_CLOUD_EMBEDDING_DIMENSIONS) is written out in both places — if the cloud defaults change, one site could be missed.
Consider either:
- Making
create_memory_fullasync and callingeffective_embedding_settings_probeddirectly, or - Extracting the fallback tuple into a
cloud_embedding_fallback()helper to keep both sites in sync
| /// Kept deliberately small and side-effect-free so it can be called from | ||
| /// the memory factory's startup path without pulling in the full | ||
| /// `local_ai::service::ollama_admin` machinery. | ||
| pub async fn probe_ollama_reachable(base_url: &str) -> bool { |
There was a problem hiding this comment.
[minor] probe_ollama_reachable is pub but has no external callers
Re-exported all the way to memory/mod.rs as a public API, but the only callers in this diff are internal (effective_embedding_settings_probed and probe_ollama_reachable_blocking). Making it public implies a stable contract, but the doc doesn't cover the indirect OLLAMA_HEALTH_REPORTED side-effect callers might trigger.
Consider pub(crate) to match how ollama_base_url is scoped in local_ai, unless there's a planned external caller.
| /// re-create the per-embed flood the gate exists to suppress — just with a | ||
| /// different message. The first failed probe trips this flag; subsequent | ||
| /// probes log at debug level and skip the Sentry report. | ||
| static OLLAMA_HEALTH_REPORTED: AtomicBool = AtomicBool::new(false); |
There was a problem hiding this comment.
[minor] OLLAMA_HEALTH_REPORTED — no #[cfg(test)] reset helper
The latch is never reset between tests. ollama_health_gate_reports_at_most_once_per_process (line 542) manually resets it, which works — but if a different test triggers the fallback path first (e.g. probed_settings_fall_back_to_cloud_when_ollama_unreachable), the latch is already tripped and the once-per-process invariant can't be verified.
A small #[cfg(test)] fn reset_health_gate_for_test() helper would make this explicit and prevent accidental ordering dependencies across the test suite.
- Delegate `ollama_base_url_for_probe` to `local_ai::ollama_base_url` so the probe always agrees with the rest of the Ollama machinery on the daemon address (was a verbatim duplicate that would silently diverge). - Redact the Sentry `base_url` tag to a low-cardinality `host[:port]` value via `redact_ollama_host`; full URL stays in the message body. Avoids indexing credentials/userinfo and prevents per-instance tag explosion. - Extract `cloud_embedding_fallback()` so the sync (`create_memory_full`) and async (`effective_embedding_settings_probed`) gate sites can't drift if the cloud defaults change. - Demote `probe_ollama_reachable` to `pub(crate)` (matches `local_ai::ollama_base_url`) and drop its re-exports from `memory::store` and `memory`. External callers should use `effective_embedding_settings_probed`. - Add `#[cfg(test)] reset_health_gate_for_test()` so any test exercising a fallback path can be ordering-independent. Both the once-per-process test and the unreachable-fallback test now call it; the once-per-process test also takes `TEST_ENV_LOCK` to serialize with the latch-tripping `probed_*` tests (caught a real parallel-execution race in this run). - Add unit test for `redact_ollama_host` covering scheme, userinfo, path, query, fragment, scheme-less, and empty inputs.
|
Pushed
The two Rust Core jobs that hung 4h+ on the previous run were force-cancelled by @senamakel and weren't assertion failures — the fresh CI on this push should resolve them. |
Summary
local_ai.runtime_enabled+local_ai.usage.embeddings) but the daemon isn't running atlocalhost:11434, the memory factory now probes the daemon once at construction time and falls back to the cloud embedder for that session.OPENHUMAN-TAURI-B7) with one low-cardinality Sentry event undermemory.ollama_health_gateper session — genuine misconfigurations still surface, but the noise stops.Problem
Sentry issue
OPENHUMAN-TAURI-B7fires whenOllamaEmbedding::embedcan't connect tohttp://localhost:11434. That's an expected state when a user toggled local embeddings in Settings but doesn't actually have Ollama running (forgot to start it, crashed, uninstalled, port conflict, …). Every embed retry generated another Sentry event — 226 in 24 hours with zero real product impact, while the user silently got a broken-feeling app with no signal that local AI failed.The naïve fix (silence the error) was wrong: it hides the diagnostic from us and from the user. The real fix is to never even attempt the Ollama call when the daemon isn't reachable, and to surface the misconfiguration once — not 226 times.
Solution
In
src/openhuman/memory/store/factories.rs:probe_ollama_reachable(base_url) -> bool(async): issues a 2-secondGET <base_url>/api/tagsand returnstrueiff the daemon responds 2xx. Transport failures, timeouts, and non-2xx all returnfalse.effective_embedding_settings_probed(...)(async): wraps the existingeffective_embedding_settingswith a probe-and-fallback step. Falls back to(cloud, embedding-v1, 1024)when the intended provider isollamabut the probe fails. Callsreport_erroronce withdomain=memory operation=ollama_health_gateso Sentry still gets the signal — just at session granularity, not per-embed.create_memory_fullnow runs the probe inline viatokio::task::block_in_place + Handle::current().block_on(...). The core always runs under a multi-thread tokio runtime so this is safe; when called with no runtime present (sync unit-test contexts), the probe is skipped and the intended provider is used (preserves pre-gate behaviour for those callers).memory::store::modandmemory::mod.The probe URL honours
OPENHUMAN_OLLAMA_BASE_URL/OLLAMA_HOSTenv vars the same way the lifecycle's existingollama_base_url()helper does, so both points always agree on the daemon address.Submission Checklist
memory::store::factories::tests: probe healthy/unreachable/non-2xx; probed-settings cloud passthrough; opted-in healthy keeps Ollama; opted-in unreachable falls back to cloud.probe_ollama_reachable,effective_embedding_settings_probed, fallback branch increate_memory_full) are exercised by the new tests; cloud-only path covered by the existingprobed_settings_keep_cloud_when_provider_is_cloud.## Related— N/A.Closes #NNN— N/A: tracked in Sentry (Fixes OPENHUMAN-TAURI-B7), not GitHub Issues.Impact
OPENHUMAN-TAURI-B7stops firing per-embed; a singlememory.ollama_health_gateevent per session takes its place. Net reduction: ~226× per affected user/day.Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
fix/ollama-embed-health-gate9aecc99549e19fd1971d9c43c026e0d8a71aa2ffValidation Run
pnpm --filter openhuman-app format:check— passed in pre-push hookpnpm typecheck— passed in pre-push hookcargo test --lib memory::store::factories→ 8 passed, 0 failed;cargo test --lib openhuman::embeddings→ 105 passed, 0 failedcargo fmt --checkclean;cargo check --testsclean (only pre-existing unrelated warnings)Validation Blocked
Behavior Changes
Parity Contract
effective_embedding_settingsis unchanged; the probe lives in the factory call site and a new async sibling.ollama_base_url_for_probemirrorslocal_ai::ollama_api::ollama_base_url(priority:OPENHUMAN_OLLAMA_BASE_URL>OLLAMA_HOST> default).Duplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
Improvements
Tests