fix(config): recover from corrupted config.toml via persistent .bak#1563
Conversation
- Keep on successful save instead of deleting it - On parse failure, attempt recovery from before falling back to defaults - Log each recovery step at appropriate severity level Closes tinyhumansai#1523
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughStruct-level ChangesConfig backup and recovery
Serde struct-level defaults
Sequence Diagram(s)sequenceDiagram
participant ConfigLoader
participant Filesystem
participant BackupFile
participant Config
ConfigLoader->>Filesystem: read config.toml
Filesystem-->>ConfigLoader: contents
ConfigLoader->>Config: toml::from_str(contents)
alt parse succeeds
Config->>ConfigLoader: Config
else parse fails
ConfigLoader->>Filesystem: read config.toml.bak
Filesystem-->>ConfigLoader: bak_contents / error
alt bak parse succeeds
ConfigLoader->>Config: toml::from_str(bak_contents)
Config->>ConfigLoader: recovered Config
ConfigLoader->>Filesystem: config.save() (persist recovered)
else bak missing/parse fails
ConfigLoader->>Config: Config::default()
Config->>ConfigLoader: default Config
ConfigLoader->>Filesystem: config.save() (persist default)
end
ConfigLoader->>Filesystem: rename corrupted config.toml -> config.toml.toml.corrupted
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/config/schema/load.rs`:
- Around line 573-600: When falling back to Config::default() (or the recovered
config) in load.rs after parse/read failures, persist that config back to disk
so the corrupted config.toml is replaced; locate the branches that return
Config::default() (and the branch that recovers from backup) and, before
returning, serialize the Config (e.g. via toml::to_string) and write it to
config_path (handle and log any write errors with tracing::error/warn), ensuring
you still return the Config instance even if the write fails; reference the
Config::default(), config_path, backup_path, parse_err, read_err, and backup_err
symbols when making this change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8e9b9a04-3ee7-45d8-a957-4099a2e00c89
📒 Files selected for processing (1)
src/openhuman/config/schema/load.rs
…ucts - Persist recovered/default config back to disk after parse-failure fallback so config.toml is replaced and the recovery path doesn't repeat every startup - Add #[serde(default)] at the struct level to all 43 Config sub-structs that already had a Default impl, preventing schema migration failures when new fields are added to existing config files
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/openhuman/config/schema/heartbeat_cron.rs (1)
9-14:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winAdd field-level defaults for
enabledandinterval_minutesto match theDefaultimpl.The struct-level
#[serde(default)]on line 9 only applies when the entire[heartbeat]section is missing from TOML. When the section is present but individual fields are missing, Serde uses the field's type default (bool::default()= false,u32::default()= 0) rather than the values in yourDefaultimpl (true, 5).This creates inconsistent behavior and violates the "minimum 5" constraint documented on line 13. In the corruption-recovery scenario this PR addresses, a partially-present config will produce
enabled: falseandinterval_minutes: 0instead of the intended defaults.🛠️ Proposed fix: add field-level defaults
+fn default_enabled() -> bool { + true +} + +fn default_interval_minutes() -> u32 { + 5 +} + /// Heartbeat configuration — periodic background loop that evaluates /// HEARTBEAT.md tasks against workspace state using local model inference. #[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)] #[serde(default)] pub struct HeartbeatConfig { /// Enable the heartbeat loop. + #[serde(default = "default_enabled")] pub enabled: bool, /// Tick interval in minutes (minimum 5). + #[serde(default = "default_interval_minutes")] pub interval_minutes: u32,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/config/schema/heartbeat_cron.rs` around lines 9 - 14, HeartbeatConfig's #[serde(default)] only applies when the whole section is missing; add field-level defaults so missing individual fields get the same values as the Default impl. Define two helper functions (e.g. fn default_enabled() -> bool { true } and fn default_interval_minutes() -> u32 { 5 }) and annotate the struct fields with #[serde(default = "default_enabled")] on enabled and #[serde(default = "default_interval_minutes")] on interval_minutes to ensure they deserialize to true and 5 when absent.src/openhuman/config/schema/identity_cost.rs (1)
30-31:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winInconsistent defaults for the
pricesfield.The field-level
#[serde(default)]usesHashMap::default()(empty), butCostConfig::default()populates it with 3 default prices viaget_default_pricing(). This creates inconsistent behavior:
- Missing entire
[cost]section → 3 default prices- Present
[cost]section with missingpricesfield → empty HashMapTests expect non-empty defaults (lines 111, 117). To ensure consistent behavior, use a custom default function for the field.
🔧 Proposed fix
/// Per-model pricing (USD per 1M tokens) - #[serde(default)] + #[serde(default = "get_default_pricing")] pub prices: HashMap<String, ModelPricing>,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/config/schema/identity_cost.rs` around lines 30 - 31, The prices field currently uses #[serde(default)] which yields an empty HashMap when the cost section exists but prices is omitted, conflicting with CostConfig::default() which populates get_default_pricing(); fix by adding a custom default function for the field (e.g., fn default_prices() -> HashMap<String, ModelPricing> { get_default_pricing() }) and changing the attribute to #[serde(default = "default_prices")] on the prices field so missing prices deserializes to the same non-empty defaults as CostConfig::default().
🧹 Nitpick comments (1)
src/openhuman/config/schema/channels.rs (1)
291-299: ⚡ Quick winConsider adding struct-level
#[serde(default)]for consistency.According to the PR objectives,
#[serde(default)]should be added to all config structs that haveDefaultimplementations.SecurityConfighasDefault(via derive) but didn't receive struct-level#[serde(default)], while other structs withDefaultin this file did (ChannelsConfig,SandboxConfig,ResourceLimitsConfig,AuditConfig).While the field-level defaults (lines 293, 295, 297) provide similar protection, adding struct-level
#[serde(default)]would ensure consistency with the PR's defensive deserialization strategy.Proposed fix for consistency
#[derive(Debug, Clone, Serialize, Deserialize, Default, JsonSchema)] +#[serde(default)] pub struct SecurityConfig {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/config/schema/channels.rs` around lines 291 - 299, The SecurityConfig struct is missing a struct-level serde default attribute; add #[serde(default)] above the SecurityConfig declaration (the same way it's used on ChannelsConfig, SandboxConfig, ResourceLimitsConfig, and AuditConfig) so deserialization will fallback to Default when the whole object is absent; update the SecurityConfig struct declaration to include this attribute while keeping the existing field-level #[serde(default)] attributes intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/openhuman/config/schema/heartbeat_cron.rs`:
- Around line 9-14: HeartbeatConfig's #[serde(default)] only applies when the
whole section is missing; add field-level defaults so missing individual fields
get the same values as the Default impl. Define two helper functions (e.g. fn
default_enabled() -> bool { true } and fn default_interval_minutes() -> u32 { 5
}) and annotate the struct fields with #[serde(default = "default_enabled")] on
enabled and #[serde(default = "default_interval_minutes")] on interval_minutes
to ensure they deserialize to true and 5 when absent.
In `@src/openhuman/config/schema/identity_cost.rs`:
- Around line 30-31: The prices field currently uses #[serde(default)] which
yields an empty HashMap when the cost section exists but prices is omitted,
conflicting with CostConfig::default() which populates get_default_pricing();
fix by adding a custom default function for the field (e.g., fn default_prices()
-> HashMap<String, ModelPricing> { get_default_pricing() }) and changing the
attribute to #[serde(default = "default_prices")] on the prices field so missing
prices deserializes to the same non-empty defaults as CostConfig::default().
---
Nitpick comments:
In `@src/openhuman/config/schema/channels.rs`:
- Around line 291-299: The SecurityConfig struct is missing a struct-level serde
default attribute; add #[serde(default)] above the SecurityConfig declaration
(the same way it's used on ChannelsConfig, SandboxConfig, ResourceLimitsConfig,
and AuditConfig) so deserialization will fallback to Default when the whole
object is absent; update the SecurityConfig struct declaration to include this
attribute while keeping the existing field-level #[serde(default)] attributes
intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 15f70238-7359-43cd-bd91-04b0d0f6b114
📒 Files selected for processing (22)
src/openhuman/config/schema/accessibility.rssrc/openhuman/config/schema/agent.rssrc/openhuman/config/schema/autocomplete.rssrc/openhuman/config/schema/autonomy.rssrc/openhuman/config/schema/channels.rssrc/openhuman/config/schema/context.rssrc/openhuman/config/schema/dictation.rssrc/openhuman/config/schema/heartbeat_cron.rssrc/openhuman/config/schema/identity_cost.rssrc/openhuman/config/schema/learning.rssrc/openhuman/config/schema/load.rssrc/openhuman/config/schema/local_ai.rssrc/openhuman/config/schema/meet.rssrc/openhuman/config/schema/node.rssrc/openhuman/config/schema/observability.rssrc/openhuman/config/schema/proxy.rssrc/openhuman/config/schema/runtime.rssrc/openhuman/config/schema/scheduler_gate.rssrc/openhuman/config/schema/storage_memory.rssrc/openhuman/config/schema/tools.rssrc/openhuman/config/schema/update.rssrc/openhuman/config/schema/voice_server.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/openhuman/config/schema/load.rs
graycyrus
left a comment
There was a problem hiding this comment.
Independent review — PR #1563
Overall the idea is sound and directly addresses 320+ Sentry events. The three-way fallback (primary → backup → defaults), persistent .bak, and the broad #[serde(default)] sweep are all the right direction. However there's a critical correctness bug in the save-on-recovery path, a family of semantic regressions from the #[serde(default)] sweep (some with security implications), and no tests.
See inline comments for details. Items marked [critical] or [major] should be addressed before merge.
| "[config] Failed to persist recovered config to disk" | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
[critical] save() overwrites the good .bak with the corrupted config.toml
When recovered_from_backup = true, config.save() is called while the original corrupted config.toml is still on disk. Inside save(), the first thing it does is fs::copy(&self.config_path, &backup_path) — which copies the corrupted file over the good .bak that was just used for recovery. After save() returns:
config.toml= recovered content (good ✓)config.toml.bak= corrupted content (bad ✗ — the good backup was just destroyed)
If config.toml gets corrupted again before the next normal save, the .bak now also contains corrupted content and the recovery path fails.
Fix: move/rename the corrupted primary out of the way before calling save(), or add a flag to save() to skip the backup step during recovery:
if recovered_from_backup {
// Move the corrupted primary so save() doesn't overwrite the good .bak
let corrupted = config.config_path.with_extension("toml.corrupted");
let _ = fs::rename(&config.config_path, &corrupted).await;
if let Err(e) = config.save().await {
tracing::warn!(
path = %config.config_path.display(),
error = %e,
"[config] Failed to persist recovered config to disk"
);
}
}| use serde::{Deserialize, Serialize}; | ||
|
|
||
| #[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)] | ||
| #[serde(default)] |
There was a problem hiding this comment.
[major] Struct-level #[serde(default)] silently disables security defaults
Adding #[serde(default)] at the struct level means individual missing fields fall back to the type default (e.g. bool::default() = false, Vec::default() = empty), not the values from the Default impl.
Fields affected:
| Field | Struct default | Type default | Impact |
|---|---|---|---|
workspace_only |
true |
false |
Agent operates outside workspace — security regression |
forbidden_paths |
18 entries incl ~/.ssh, ~/.aws |
empty | Sensitive dirs unprotected |
allowed_commands |
9 entries | empty | All commands blocked |
max_actions_per_hour |
20 |
0 |
Rate limiting disabled |
max_cost_per_day_cents |
500 |
0 |
Spending cap disabled |
Fix: add field-level #[serde(default = "fn")] for every field whose struct Default value differs from the type's zero value:
#[serde(default = "default_workspace_only")]
pub workspace_only: bool,
#[serde(default = "default_forbidden_paths")]
pub forbidden_paths: Vec<String>,
// ... etc for each mismatched field|
|
||
| #[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)] | ||
| #[allow(clippy::struct_excessive_bools)] | ||
| #[serde(default)] |
There was a problem hiding this comment.
[major] Missing field-level defaults — backend becomes empty string, auto_save silently disabled
`MemoryConfig` gains struct-level `#[serde(default)]` but `backend` and `auto_save` have no field-level overrides:
- `backend`: struct default = `"sqlite"`, type default = `""` → memory subsystem initializes with empty backend string
- `auto_save`: struct default = `true`, type default = `false` → auto-save silently disabled
// Fix: add field-level defaults
#[serde(default = "default_memory_backend")]
pub backend: String,
#[serde(default = "default_true")]
pub auto_save: bool,
fn default_memory_backend() -> String { "sqlite".into() }| /// Heartbeat configuration — periodic background loop that evaluates | ||
| /// HEARTBEAT.md tasks against workspace state using local model inference. | ||
| #[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)] | ||
| #[serde(default)] |
There was a problem hiding this comment.
[major] Missing field-level defaults — enabled becomes false, interval_minutes becomes 0
(Expanding on CodeRabbit's finding with the concrete impact.)
With struct-level #[serde(default)] but no field-level overrides on enabled and interval_minutes:
enabled: structDefault=true, type default =false→ heartbeat loop silently disabled for any user whose config section exists but omits this fieldinterval_minutes: structDefault=5, type default =0→ tick interval set to zero minutes, which depending on the scheduler could cause a busy-loop or division-by-zero
Same pattern affects inference_enabled (struct default true, type default false).
// Fix
#[serde(default = "default_heartbeat_enabled")]
pub enabled: bool,
#[serde(default = "default_interval_minutes")]
pub interval_minutes: u32,
#[serde(default = "default_inference_enabled")]
pub inference_enabled: bool,
fn default_heartbeat_enabled() -> bool { true }
fn default_interval_minutes() -> u32 { 5 }
fn default_inference_enabled() -> bool { true }|
[critical] No tests for recovery behavior The PR changes the most sensitive boot-time path in the codebase ( Needed tests (all
Additionally: is |
The recovery-to-disk path in load_or_init was calling config.save() while the corrupted config.toml still existed on disk. save() would fs::copy the corrupted primary over the (good) .bak before renaming the temp file into place, destroying the recovery source. Fix: rename the corrupted config to config.toml.corrupted before save() so had_existing_config is false and the backup is preserved. Adds 4 recovery tests: backup restores, backup also corrupted → defaults, no backup → defaults, valid config does not trigger recovery.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/config/schema/load.rs`:
- Around line 617-631: The fs::rename call that moves the corrupted primary into
config.toml.corrupted is currently ignored (let _ = fs::rename(...)), so if the
rename fails you must not call config.save() because that will overwrite the
good .bak; change the code to check the result of fs::rename(&config_path,
&corrupted_path).await, and on Err(e) log the failure (use tracing::warn with
path = %config.config_path.display() and error = %e) and return/skip the
subsequent config.save() call (or otherwise short-circuit recovery). In short:
replace the silent ignore around fs::rename with an explicit match/if let Err
handling that logs the error and prevents calling config.save() when the rename
failed (symbols to update: recovered_from_backup block, fs::rename,
corrupted_path, and config.save()).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1aa190c4-bdeb-498b-b937-a8a1dfde8ef5
📒 Files selected for processing (2)
src/openhuman/config/schema/load.rssrc/openhuman/config/schema/load_tests.rs
Addresses all actionable reviewer (graycyrus + coderabbitai) findings:
1. load.rs — handle fs::rename failure explicitly instead of silently
swallowing it with `let _ = ...`. When rename fails, skip calling
save() so the good .bak is never overwritten by the corrupted primary;
log a warn and let the next startup retry recovery instead.
2. autonomy.rs — add field-level #[serde(default = "fn")] for every
field whose struct Default value differs from the type default
(workspace_only, allowed_commands, forbidden_paths,
max_actions_per_hour, max_cost_per_day_cents). Struct-level
#[serde(default)] alone only fires when the entire struct is absent;
individual missing fields fall back to the type default (false, 0,
vec![]) — not the struct Default — silently disabling security guards.
3. heartbeat_cron.rs — same issue: add field-level defaults for enabled
(true), interval_minutes (5), and inference_enabled (true), all of
which would resolve to the wrong type default when a partial
[heartbeat] section is present in config.toml.
4. storage_memory.rs — add field-level defaults for MemoryConfig::backend
("sqlite") and auto_save (true) to prevent empty-backend string and
silently-disabled auto-save when a partial [memory] section exists.
5. load_tests.rs — rename env_lock() helper to use poison recovery (was
already present in the working-tree; keep it). Expanded tests now
cover: recovery from backup (backup intact after save), both-corrupted
fallback (corrupted primary renamed to .toml.corrupted), no-backup
fallback, and happy-path (no recovery triggered on valid config).
All 158 config schema tests pass.
|
Review fixes applied (via senamakel/openhuman The following reviewer comments from @graycyrus and @coderabbitai have been addressed in commit Fixed:
Note on pre-push hook: The |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 49447 🏁 Script executed: Length of output: 8154
✅ Confirmed fixed:
The fix pattern that's already working (e.g. // autonomy.rs
fn default_workspace_only() -> bool { true }
fn default_allowed_commands() -> Vec<String> { /* the existing Default list */ }
fn default_forbidden_paths() -> Vec<String> { /* the existing Default list */ }
fn default_max_actions_per_hour() -> u32 { 20 }
fn default_max_cost_per_day_cents() -> u32 { 500 }
pub struct AutonomyConfig {
#[serde(default = "default_workspace_only")]
pub workspace_only: bool,
#[serde(default = "default_allowed_commands")]
pub allowed_commands: Vec<String>,
// ...
}// heartbeat_cron.rs — `default_true` already exists in this file
#[serde(default = "default_true")]
pub enabled: bool,
#[serde(default = "default_interval_minutes")]
pub interval_minutes: u32,
#[serde(default = "default_true")] // replaces bare #[serde(default)]
pub inference_enabled: bool,
fn default_interval_minutes() -> u32 { 5 }// storage_memory.rs
fn default_backend() -> String { "sqlite".into() }
fn default_auto_save() -> bool { true }
pub struct MemoryConfig {
#[serde(default = "default_backend")]
pub backend: String,
#[serde(default = "default_auto_save")]
pub auto_save: bool,
// ...
}Still open from
|
Unify config recovery with upstream's parse_config_with_recovery helper: return (Config, bool) so load_or_init can still detect corruption and rename the corrupt primary to .corrupted before persisting the recovered config (protecting the good .bak from being overwritten by save()).
Summary
When config.toml becomes corrupted, the app currently resets to defaults and loses the previous configuration entirely. This fix ensures a last-known-good backup is always available and used for automatic recovery.
Changes
src/openhuman/config/schema/load.rs:Persistent .bak —
save()now keeps.toml.bakon disk after a successful write, preserving the last-known-good config.Backup recovery —
load_or_init()attempts to load from.toml.bakwhen the primary config file fails to parse, with appropriate logging at each step:Graceful fallback — If both the config file and backup are corrupted (or no backup exists), resets to
Config::default().Impact
Closes #1523
Summary by CodeRabbit