Skip to content

Mission-critical audit fixes: model cache safety, task race, and stale tests#40

Closed
christopherkarani wants to merge 1 commit intomainfrom
automation/check-frameworks-issues-20260318
Closed

Mission-critical audit fixes: model cache safety, task race, and stale tests#40
christopherkarani wants to merge 1 commit intomainfrom
automation/check-frameworks-issues-20260318

Conversation

@christopherkarani
Copy link
Copy Markdown
Owner

Summary

This PR delivers the mission-critical audit fixes identified in the 2026-03-18 automation run.

Correctness and Safety Fixes

  1. ModelCache partial-failure safety (P0)
  • Fixed ModelCache.clearAll() so it no longer wipes in-memory metadata when disk deletion partially fails.
  • New behavior: successfully deleted entries are removed, failed entries are retained and persisted, then an error is surfaced.
  • Prevents metadata/data divergence and orphaned-cache corruption.
  1. ModelManager download task race (P1)
  • Fixed ModelManager.downloadTask(for:) race where callers could observe .taskNotStarted before backing work was wired.
  • The backing async task is now attached before returning the DownloadTask.
  1. OpenAI streaming hardening (P1)
  • Hardened tool-call delta parsing to skip malformed deltas with empty id/name.
  • Prevents upstream malformed payloads from propagating into PartialToolCall precondition crashes.
  1. ChatSession state snapshot hardening
  • stream(_:) now snapshots config under lock to keep captured state consistent.

API/Test Drift + Build Hygiene

  • Aligned DiffusionModelDownloaderTests compile guard with production feature flags.
  • Updated stale provider/model count tests to assert stable invariants after provider-catalog growth.
  • Updated ProviderType case coverage test to include new providers.
  • Removed deprecated String(cString:) usage in DeviceCapabilities.
  • Made GenerationSchema encoding error payload Sendable-safe.
  • Updated stale registry catalog doc comment.

Regression Tests Added

  • ModelCacheTests.testClearAllRetainsEntriesWhenDeletionFails
  • ModelManagerRegressionTests.testDownloadTaskImmediatelyWiresUnderlyingTask

Verification

  • swift test --filter ModelCacheTests --filter ModelManagerRegressionTests
  • swift test --filter ModelIdentifierTests
  • swift test --filter ProtocolCompilationTests
  • swift test
  • swift build

All passing in this worktree.

- align DiffusionModelDownloaderTests compile guard with production feature flags
- replace deprecated String(cString:) CPU brand parsing path
- make schema encoding error payload Sendable-safe
- repair brittle model registry/provider count tests for expanded provider catalog
- fix ModelCache.clearAll() to preserve metadata for entries whose deletion fails
- wire ModelManager.downloadTask(for:) backing task before returning to prevent taskNotStarted races
- harden OpenAI streaming parser by skipping malformed tool call deltas with empty id/name
- capture ChatSession stream config under lock for consistent snapshot semantics
- add regression tests for clearAll partial-failure retention and immediate task wiring
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@christopherkarani
Copy link
Copy Markdown
Owner Author

Closing — these fixes were superseded by a164c7c ("Stabilize MLX, Foundation Models, and warning cleanup") which addressed the MLX gating, stale registry tests, and Sendable warnings comprehensively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant