-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
good first issueGood for newcomersGood for newcomers
Description
Summary
Extend the serialization system to support references instead of full serialization for large external data sources. This enables agents to work with millions of database rows or thousands of cloud-stored images without serializing the actual data.
Problem
When an agent's state includes references to large datasets (e.g., database query results, cloud storage paths), serializing the full data is impractical:
- Memory constraints
- Serialization time
- Storage costs
Expected Behavior
Users should be able to mark fields as "reference-only":
from opensymbolicai import ExternalRef
class MyAgent(PlanExecute):
# Instead of serializing 1M rows, store a reference
dataset: ExternalRef[DatabaseQuery] = ExternalRef(
ref="postgres://db/table?query=SELECT * FROM users",
loader=lambda ref: db.execute(ref)
)On resume, the reference is used to reload the data rather than deserializing it.
Implementation Hints
- Create an
ExternalRefwrapper type - Store only the reference string/URI during serialization
- Provide a
loadercallback for deserialization - Consider lazy-loading patterns
Use Cases
- Database query results — store query, not rows
- Cloud storage files — store S3/GCS paths, not bytes
- API responses — store endpoint + params, not response data
Acceptance Criteria
- Implement
ExternalReftype with reference storage - Add loader/resolver mechanism for resume
- Unit tests for reference serialization round-trip
- Integration test with mock external data source
- Document common patterns (database, cloud storage)
Files to Look At
- src/opensymbolicai/checkpoint.py — serialization logic
- src/opensymbolicai/models.py — Pydantic models
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomers