Skip to content

Smart Context Pruning with Token-Aware Filtering #3829

@sarojrout

Description

@sarojrout

Is your feature request related to a problem? Please describe.
Yes. The existing ContextFilterPlugin only keeps N invocations but doesn't consider:

  1. Actual Token Count: It only counts invocations, not actual tokens. A single invocation with a large tool response could exceed context limits
  2. Relevance: It removes important context while keeping less relevant content based only on recency
  3. Context Window Pressure: No proactive management before hitting limits - it only filters after the fact
  4. No Token Awareness: Can't optimize for token efficiency when context is getting full
# ContextFilterPlugin keeps last 5 invocations
# But one invocation might have a 10k token tool response
# While another has just 100 tokens
# Result: Context exceeds limits even with only 5 invocations

Describe the solution you'd like
Enhance ContextFilterPlugin with token-aware and relevance-based filtering:

class SmartContextFilterPlugin(BasePlugin):
    """Intelligent context filtering based on tokens and relevance."""
    
    max_context_tokens: int = 32000
    relevance_threshold: float = 0.7
    preserve_tool_results: bool = True
    preserve_user_corrections: bool = True
    use_embeddings: bool = True  # For relevance scoring
    warn_at_percent: float = 0.8  # Warn when 80% full
  1. Token Counting: Monitor actual token count of context before each LLM call
  2. Proactive Pruning: When approaching max_context_tokens, remove least relevant events first
  3. Relevance Scoring: Use embeddings to score relevance of each event to current query
  4. Priority Preservation: Always preserve:
    • Tool results (critical for agent reasoning)
    • User corrections (important feedback)
    • Recent events (within last N invocations)
  5. Early Warnings: Log warnings when context is 80% full
  6. Semantic Deduplication: Remove redundant information (e.g., repeated instructions)

Usage:

from google.adk.plugins import SmartContextFilterPlugin
from google.adk import App, Agent

app = App(
    name="my_app",
    root_agent=agent,
    plugins=[
        SmartContextFilterPlugin(
            max_context_tokens=32000,
            relevance_threshold=0.7,
            preserve_tool_results=True
        )
    ]
)

Describe alternatives you've considered

  1. Manual Context Management: Users manually manage context size:

    • Requires constant monitoring
    • Error-prone
    • Doesn't scale
  2. Fixed Invocation Count: Current approach of keeping N invocations:

    • Doesn't account for token variance
    • Can still exceed limits
    • No relevance consideration
  3. Post-Processing Filtering: Filter after context is built:

    • Less efficient
    • May remove context already sent to model
    • Doesn't prevent hitting limits
      Additional context
  • Long-running conversations: Sessions that accumulate many turns
  • Multi-agent systems: Shared context that needs optimization
  • Cost-sensitive deployments: Need to maximize context efficiency
  • Large tool responses: When tools return substantial data

Implementation Notes:

  • Can extend existing ContextFilterPlugin or create new plugin
  • Requires token counting utility (can reuse from context cache manager)
  • Embedding-based relevance requires embedding model (optional)
  • Should integrate with event compaction for maximum efficiency

Related Code:

  • Current implementation: src/google/adk/plugins/context_filter_plugin.py
  • Token estimation: src/google/adk/models/gemini_context_cache_manager.py:314 (_estimate_request_tokens)
  • Event compaction: src/google/adk/apps/compaction.py

Priority:

High - Significant cost savings potential and improves context management.

Metadata

Metadata

Assignees

Labels

core[Component] This issue is related to the core interface and implementation

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions