Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
---
id: AILOG-2026-05-31-001
title: Fase 2 — FUSE functional hardening + T101 performance validation
status: draft
created: 2026-05-31
agent: claude-opus-4-8-v1.0
confidence: high
review_required: true
risk_level: medium
tags: [fuse, readdir, inode, performance, charter-01, t101, files-on-demand, robustness]
related:
- CHARTER-01-road-to-v0-1-0-alpha-1
- AILOG-2026-05-28-001
eu_ai_act_risk: not_applicable
nist_genai_risks: [information_security]
iso_42001_clause: [8]
---

# AILOG: Fase 2 — FUSE functional hardening + T101

## Summary

Closes T101 (the one remaining task of `002-files-on-demand`, the sole declared
work item of Charter-01 Fase 2): validate FUSE performance — `getattr` < 1ms,
`readdir` < 10ms for 1000 entries, idle memory < 50MB with 10k tracked files.

Implementing T101 as a real-mount integration test
(`crates/lnxdrive-fuse/tests/integration_perf_t101.rs`) made it the **first
end-to-end exercise of a real FUSE mount in the codebase** — every prior FUSE
test exercises internal primitives, and the doc-tests that mount are ignored in
CI for lack of `/dev/fuse`. That first real mount surfaced a chain of four
functional bugs that made directory listing non-functional; each is fixed here,
with CI-runnable regression tests, before the performance numbers could even be
taken.

**T101 result (real mount, this PR):**

| Metric | Target | Measured |
|---|---|---|
| `getattr` (lookup+getattr upper bound) | < 1 ms | 43.7 µs |
| `readdir`, 1000 entries | < 10 ms | 1.40 ms |
| idle RSS, 10k tracked files | < 50 MB | 37.9 MB |

## Context

The Charter scoped Fase 2 as: close T101, remove `~4 todo!()/unimplemented!()`
sites and `~10 debug println!`, and enable `cargo test --workspace` in CI. An
audit of `main` at the start of this work found the latter three **already
done during Fase 1** (zero `todo!`/`unimplemented!` in crates, zero debug
`println!` outside the CLI, `cargo test --workspace` live at
`.github/workflows/engine-ci.yml:66`). T101 was the only real remaining work —
recorded in the Charter telemetry below so the declared-vs-actual gap is explicit
([[feedback-strict-governance]]).

Per the operator's choice, T101 is validated with an `#[ignore]` integration
test that mounts a real FUSE filesystem (needs `/dev/fuse` + `fusermount3`; CI
has neither, so it is a local gate) backed by a file-based DB populated with N
entries, then issues real syscalls and times them. A `multi_thread` runtime is
mandatory: FUSE `init()` runs on fuser's own OS thread and `block_on`s DB work
served by the `WriteSerializer`; a `current_thread` runtime deadlocks while the
test thread blocks in `read_dir`/`stat` syscalls.

## The bug chain (discovered by the T101 mount)

1. **`init()` panics on every mount — `tokio::spawn` with no runtime.**
`init()` (`filesystem.rs:554`) calls `DehydrationManager::start_periodic()`,
which does `tokio::spawn` (`dehydration.rs:430`). `init()` runs on fuser's
thread, which has no Tokio runtime entered, so the spawn panics with *"there
is no reactor running"*. The daemon mounts identically
(`main.rs:276` → `spawn_mount2`), so **auto-mount was broken** — undetected
because no test mounted for real. **Fix:** enter the runtime already held in
`self.rt_handle` for the duration of the spawn.

2. **`children()` lists the root as its own child → empty `ls`.**
The root entry is built with `parent_ino == ino == 1`, so `children(1)`
matched it. With an empty name (`String::new()`), `reply.add` stalls `readdir`
right after `.`/`..`, so only `.`/`..` are returned — which `read_dir` filters
out, yielding **zero entries**. **Fix:** `children()` excludes the entry whose
`ino == parent_ino`. This was the root cause of the empty/partial listings.

3. **`readdir` paginates over an unstable order → large `ls` loses files.**
`children()` collected from `DashMap::iter()`, whose order is not stable
between calls. `readdir` pages by positional offset across multiple kernel
calls, so the reshuffle between pages skipped/duplicated entries
non-deterministically (1000-entry dirs lost files at random). **Fix:** sort
`children()` by inode for a deterministic cross-page order.

4. **`opendir` returns `FOPEN_KEEP_CACHE` on a dynamic directory.**
That flag tells the kernel its cached listing is still valid, so after the
first open the kernel serves the cached page and stops issuing `readdir`. The
listing is lazily populated and dynamic (the sync engine mutates it), so the
kernel must re-issue `readdir` each open. **Fix:** `reply.opened(fh, 0)`.

5. **Inode not persisted across save — unstable inodes between mounts.**
`save_item`'s `INSERT OR REPLACE` omitted the `inode` column (reset to NULL on
every re-save) and `sync_item_from_row` never read it back, so the filesystem
re-allocated a fresh inode for every item on every mount. **Fix:** `save_item`
preserves the inode (its own, else the stored one, mirroring the `account_id`
handling) and `sync_item_from_row` reads the column back via `set_inode`.

**Not a bug — discarded:** an early read suggested `get_next_inode` did a
non-atomic `SELECT`+`UPDATE`. It already wraps both in a transaction
(`repository.rs:1023` `begin()`…`commit()`); no change made. The duplicate-inode
symptom was bug #2 (inodes never read back) compounded by #3.

## Change

### Code

- **`crates/lnxdrive-fuse/src/filesystem.rs`** — `init()` enters `rt_handle`
before `start_periodic()` (#1); `opendir` replies with no cache flags (#4).
- **`crates/lnxdrive-fuse/src/inode.rs`** — `children()` excludes the
self-referential root and sorts by inode (#2, #3).
- **`crates/lnxdrive-cache/src/repository.rs`** — `save_item` persists/preserves
the `inode` column; `sync_item_from_row` reads it back (#5).

### Tests

- **`crates/lnxdrive-fuse/tests/integration_perf_t101.rs`** (new, `#[ignore]`) —
the two T101 tests (latency + idle memory) over a real mount. Configurable via
`LNXDRIVE_PERF_N`.
- **`crates/lnxdrive-fuse/src/inode.rs`** — two CI-runnable regression tests:
`test_children_excludes_self_referential_root` (#2) and
`test_children_stable_sorted_order` (#3). These run without `/dev/fuse`, so the
functional contract is guarded in CI even though the mount test is not.
- **`crates/lnxdrive-cache/tests/repository_tests.rs`** —
`test_save_item_preserves_and_reads_back_inode` (#5).

## Verification

```bash
cd lnxdrive-engine

# CI-runnable regression tests (no /dev/fuse needed)
cargo test -p lnxdrive-fuse --lib inode::tests # incl. the two children regressions
cargo test -p lnxdrive-cache --test repository_tests inode

# Full workspace — no regressions
cargo test --workspace # all green
cargo clippy -p lnxdrive-fuse -p lnxdrive-cache --all-targets -- -D warnings

# T101 performance gate (LOCAL ONLY — needs /dev/fuse + fusermount3)
cargo test -p lnxdrive-fuse --test integration_perf_t101 -- --ignored --nocapture
# Expected: readdir ~1.4ms/1000, getattr ~44µs, idle RSS ~38MB/10k — all under target.
```

## Drift

- **Fase 2 scope was 4 items; 3 were already done in Fase 1.** Only T101
remained. The Charter `## Scope` Fase 2 row is updated to reflect this
(declared-vs-actual, per [[feedback-strict-governance]]).
- **Scope grew from "measure performance" to "fix the FUSE listing path".** T101
could not run until bugs #1–#4 were fixed; #5 was fixed in the same pass as it
is the same subsystem and a real correctness defect. Approved by the operator
as Fase 2 = FUSE functional hardening.
- **Idle-memory metric implemented as a Rust test, not a `lnxdrive-testing`
shell script.** Reading `/proc/self/statm` in the existing integration test is
more robust and CI-consistent than standing up the full daemon (which needs
GOA auth) from bash, and reuses the mount setup. The measured process RSS is a
conservative upper bound on the daemon's tracked-file footprint.
- **T101 latency is measured as observed syscall latency**, which includes the
unavoidable FUSE round-trip and (for `getattr`) the preceding `lookup`. The
reported `getattr` figure is therefore an upper bound on `getattr` alone; it is
well under target regardless.

## Risk

Per-bug regression surface (all on the FUSE read/mount path, all now covered by
tests):

- **#1 (init runtime enter).** Low. `Handle::enter()` returns a guard scoped to
the spawn; the spawned task outlives the guard correctly. Without it, no mount
worked at all, so this strictly restores function.
- **#2 (root self-exclusion).** Low. The filter only ever removes the single
self-referential entry (the root); regular nested dirs are unaffected
(`ino != parent_ino` for them). Covered by a unit test.
- **#3 (sorted children).** Low. Sorting is O(n log n) per `readdir`; for a
10k-entry directory this is negligible against the syscall cost, and T101's
1.40ms/1000 confirms headroom. Determinism is required for correctness, not
just tidiness.
- **#4 (no dir cache).** Low–medium. Dropping `FOPEN_KEEP_CACHE` means the kernel
re-issues `readdir` per open instead of serving a cached page; correct for a
dynamic directory and measured fast. A future `notify_inval_entry`-based cache
invalidation could re-enable caching safely — deferred polish, not debt.
- **#5 (inode persistence).** Low. `save_item` now writes one more column and
preserves the existing value; `from_row` sets it post-deserialization without
touching the serde representation. Covered by a round-trip test.

No emergent risks. `cargo test --workspace` is green and clippy is clean.

## Telemetry

| Metric | Estimated | Actual |
|---|---|---|
| Effort | ~0.5 day (T101 only) | ~1 day (T101 + 5 bug fixes) |
| Lines added | ~120 (perf test) | ~360 (perf test + fixes + regressions + AILOG) |
| Lines removed | ~0 | ~10 |
| New files | 1 (perf test) | 2 (perf test, AILOG) |
| Bugs found | 0 (validation only) | 5 functional (4 fixed on the listing path + 1 inode) |
| Existing tests broken | 0 | 0 |
| Tests added | T101 | 2 T101 (ignore) + 3 CI regressions |
| Pre-commit hook failures | n/a | none |
2 changes: 1 addition & 1 deletion .straymark/charters/01-road-to-v0-1-0-alpha-1.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
---
charter_id: CHARTER-01-road-to-v0-1-0-alpha-1
status: in-progress
Expand Down Expand Up @@ -28,7 +28,7 @@
- `RISK-001`: D-Bus health monitor + reconnect in `lnxdrive-daemon`. Full Unix-socket fallback explicitly deferred to v0.2.
- `ISSUE-002`: harden the YAML config parser against billion-laughs (size + alias caps); regression fixture in `lnxdrive-engine/tests/security/`.
- `cargo audit` + `cargo deny` jobs in CI.
3. **Engine polish** — close the one remaining task in `lnxdrive-engine/specs/002-files-on-demand/tasks.md`; remove the ~4 `todo!()/unimplemented!()` sites and ~10 debug `println!` calls; enable `cargo test --workspace` in CI.
3. **Engine polish** — close the one remaining task (T101 performance validation) in `lnxdrive-engine/specs/002-files-on-demand/tasks.md`. **Done** (Fase 2): T101 validated via a real-mount integration test — `getattr` 43.7µs, `readdir` 1.40ms/1000 entries, idle RSS 37.9MB/10k files (all under target). The test was the first real FUSE mount exercised in the codebase and surfaced four functional listing bugs (init runtime-context panic, root self-listing, unstable `readdir` order, `opendir` dir-cache) plus an inode-persistence defect, all fixed with regression tests — see AILOG-2026-05-31-001. The other three items this row originally listed (remove `todo!()/unimplemented!()`, remove debug `println!`, enable `cargo test --workspace` in CI) were **already completed during Fase 1** (verified against `main`: zero such sites in crates; `cargo test --workspace` live at `.github/workflows/engine-ci.yml:66`).
4. **GTK4 preferences panel** — implement four basic settings groups (Account, Folders, Network, System) in `lnxdrive-gnome/src/main.rs` (currently a `println!("not yet implemented")` stub) wired to the existing D-Bus daemon API.
5. **Flatpak packaging** — complete `lnxdrive-packaging/flatpak/com.strangedaystech.LNXDrive.yaml` with install stages (icons, `*.desktop`, metainfo XML), correct permissions (`--filesystem=home:rw`, `--talk-name=org.freedesktop.secrets`), and target `org.gnome.Platform 47`. Fix `lnxdrive.spdx` (currently describes StrayMark by mistake). Complete the metainfo XML with description, releases section, and screenshot URLs.
6. **Release infrastructure & public assets** — `.github/workflows/release.yml` (tag → bundle → GitHub Release with SHA256SUMS); `SECURITY.md`; `CHANGELOG.md`; 6 UI screenshots in `docs/screenshots/`; version `0.1.0-alpha.1` consistent across every `Cargo.toml`, Flatpak manifest, and metainfo XML; README install section + competitive comparison vs `jstaf/onedriver` and `abraunegg/onedrive`.
Expand Down
31 changes: 28 additions & 3 deletions lnxdrive-engine/crates/lnxdrive-cache/src/repository.rs
Original file line number Diff line number Diff line change
Expand Up @@ -264,10 +264,18 @@ fn sync_item_from_row(row: &SqliteRow) -> Result<SyncItem, CacheError> {
"error_info": error_info_val,
});

let item: SyncItem = serde_json::from_value(item_json).map_err(|e| {
let mut item: SyncItem = serde_json::from_value(item_json).map_err(|e| {
CacheError::SerializationError(format!("Failed to reconstruct SyncItem from row: {}", e))
})?;

// The FUSE inode is stored as its own (nullable) column rather than inside
// the serialized item, so it must be applied after deserialization.
// Without this the item always loads with inode = None and the filesystem
// re-allocates a fresh inode on every mount instead of reusing the stable
// one persisted by `update_inode`.
let inode: Option<i64> = row.get("inode");
item.set_inode(inode.map(|i| i as u64));

Ok(item)
}

Expand Down Expand Up @@ -596,12 +604,28 @@ impl IStateRepository for SqliteStateRepository {
}
};

// Preserve the FUSE inode across the UPSERT. `INSERT OR REPLACE` rewrites
// the whole row, so omitting `inode` would reset it to NULL on every
// re-save (e.g. after a sync state change), detaching the item from the
// inode the kernel already handed out. Use the item's own inode if it
// carries one, otherwise keep whatever is already stored.
let inode_to_store: Option<i64> = match item.inode() {
Some(i) => Some(i as i64),
None => sqlx::query_scalar::<_, Option<i64>>(
"SELECT inode FROM sync_items WHERE id = ?",
)
.bind(&id)
.fetch_optional(&self.pool)
.await?
.flatten(),
};

sqlx::query(
"INSERT OR REPLACE INTO sync_items \
(id, account_id, local_path, remote_id, remote_path, state, \
content_hash, local_hash, size_bytes, last_sync, \
last_modified_local, last_modified_remote, metadata, error_info) \
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
last_modified_local, last_modified_remote, metadata, error_info, inode) \
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
)
.bind(&id)
.bind(&account_id)
Expand All @@ -617,6 +641,7 @@ impl IStateRepository for SqliteStateRepository {
.bind(&last_modified_remote)
.bind(&metadata)
.bind(&error_info)
.bind(inode_to_store)
.execute(&self.pool)
.await?;

Expand Down
27 changes: 27 additions & 0 deletions lnxdrive-engine/crates/lnxdrive-cache/tests/repository_tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1026,3 +1026,30 @@ async fn test_get_items_for_dehydration_excludes_pinned_and_modified() {
assert_eq!(candidates[0].id(), item_hydrated.id());
assert!(matches!(candidates[0].state(), ItemState::Hydrated));
}

/// Regression: the FUSE inode must round-trip through the DB and survive a
/// re-save. Previously `save_item`'s INSERT OR REPLACE omitted the `inode`
/// column (resetting it to NULL on every re-save) and `sync_item_from_row`
/// never read it back, so the filesystem re-allocated inodes on every mount.
#[tokio::test]
async fn test_save_item_preserves_and_reads_back_inode() {
let repo = setup().await;
let _account = create_test_account(&repo).await;
let item = create_test_sync_item();
let id = *item.id();

repo.save_item(&item).await.unwrap();
// A freshly saved item has no inode yet.
let loaded = repo.get_item(&id).await.unwrap().unwrap();
assert_eq!(loaded.inode(), None);

// The FUSE layer assigns one out-of-band.
repo.update_inode(&id, 42).await.unwrap();
let with_inode = repo.get_item(&id).await.unwrap().unwrap();
assert_eq!(with_inode.inode(), Some(42), "inode must be read back from the row");

// Re-saving the item must NOT wipe the inode.
repo.save_item(&with_inode).await.unwrap();
let after = repo.get_item(&id).await.unwrap().unwrap();
assert_eq!(after.inode(), Some(42), "re-saving must preserve the inode");
}
22 changes: 18 additions & 4 deletions lnxdrive-engine/crates/lnxdrive-fuse/src/filesystem.rs
Original file line number Diff line number Diff line change
Expand Up @@ -544,13 +544,20 @@ impl Filesystem for LnxDriveFs {
"LnxDrive FUSE filesystem initialized"
);

// T086: Start the periodic dehydration sweep task
// T086: Start the periodic dehydration sweep task.
//
// `init()` runs on fuser's own OS thread, which has no Tokio runtime in
// its thread-local context, so `start_periodic()`'s `tokio::spawn` would
// panic with "there is no reactor running". Enter the runtime we already
// hold in `self.rt_handle` (the same one used for `block_on` above) for
// the duration of the spawn so the sweep task is registered correctly.
if let Some(manager) = &self.dehydration_manager {
let interval = manager.policy().interval_minutes;
tracing::info!(
interval_minutes = interval,
"Starting periodic dehydration sweep"
);
let _guard = self.rt_handle.enter();
let task = manager.clone().start_periodic();
self.dehydration_task = Some(task);
}
Expand Down Expand Up @@ -1106,9 +1113,16 @@ impl Filesystem for LnxDriveFs {

debug!("opendir: opened directory ino={} with fh={}", ino, fh);

// Reply with the file handle and FOPEN_KEEP_CACHE flag
// FOPEN_KEEP_CACHE tells the kernel to keep cached directory data
reply.opened(fh, FOPEN_KEEP_CACHE);
// Reply with the file handle and NO cache flags.
//
// `FOPEN_KEEP_CACHE` on a directory tells the kernel its cached listing
// is still valid, so after the first open the kernel serves the cached
// page and stops issuing `readdir` calls. Because the listing is
// populated lazily by `readdir`, the very first open caches an *empty*
// directory and the kernel then never calls `readdir` at all — `ls`
// shows nothing. The contents are also dynamic (the sync engine adds and
// removes entries), so the kernel must re-issue `readdir` on each open.
reply.opened(fh, 0);
}

/// Releases (closes) an open directory.
Expand Down
Loading
Loading