Skip to content

Add reference-based serialization for large external data #4

@rajkumar42

Description

@rajkumar42

Summary

Extend the serialization system to support references instead of full serialization for large external data sources. This enables agents to work with millions of database rows or thousands of cloud-stored images without serializing the actual data.

Problem

When an agent's state includes references to large datasets (e.g., database query results, cloud storage paths), serializing the full data is impractical:

  • Memory constraints
  • Serialization time
  • Storage costs

Expected Behavior

Users should be able to mark fields as "reference-only":

from opensymbolicai import ExternalRef

class MyAgent(PlanExecute):
    # Instead of serializing 1M rows, store a reference
    dataset: ExternalRef[DatabaseQuery] = ExternalRef(
        ref="postgres://db/table?query=SELECT * FROM users",
        loader=lambda ref: db.execute(ref)
    )

On resume, the reference is used to reload the data rather than deserializing it.

Implementation Hints

  • Create an ExternalRef wrapper type
  • Store only the reference string/URI during serialization
  • Provide a loader callback for deserialization
  • Consider lazy-loading patterns

Use Cases

  • Database query results — store query, not rows
  • Cloud storage files — store S3/GCS paths, not bytes
  • API responses — store endpoint + params, not response data

Acceptance Criteria

  • Implement ExternalRef type with reference storage
  • Add loader/resolver mechanism for resume
  • Unit tests for reference serialization round-trip
  • Integration test with mock external data source
  • Document common patterns (database, cloud storage)

Files to Look At

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions