-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Open
Labels
core[Component] This issue is related to the core interface and implementation[Component] This issue is related to the core interface and implementation
Description
Is your feature request related to a problem? Please describe.
Yes. The existing ContextFilterPlugin only keeps N invocations but doesn't consider:
- Actual Token Count: It only counts invocations, not actual tokens. A single invocation with a large tool response could exceed context limits
- Relevance: It removes important context while keeping less relevant content based only on recency
- Context Window Pressure: No proactive management before hitting limits - it only filters after the fact
- No Token Awareness: Can't optimize for token efficiency when context is getting full
# ContextFilterPlugin keeps last 5 invocations
# But one invocation might have a 10k token tool response
# While another has just 100 tokens
# Result: Context exceeds limits even with only 5 invocations
Describe the solution you'd like
Enhance ContextFilterPlugin with token-aware and relevance-based filtering:
class SmartContextFilterPlugin(BasePlugin):
"""Intelligent context filtering based on tokens and relevance."""
max_context_tokens: int = 32000
relevance_threshold: float = 0.7
preserve_tool_results: bool = True
preserve_user_corrections: bool = True
use_embeddings: bool = True # For relevance scoring
warn_at_percent: float = 0.8 # Warn when 80% full- Token Counting: Monitor actual token count of context before each LLM call
- Proactive Pruning: When approaching
max_context_tokens, remove least relevant events first - Relevance Scoring: Use embeddings to score relevance of each event to current query
- Priority Preservation: Always preserve:
- Tool results (critical for agent reasoning)
- User corrections (important feedback)
- Recent events (within last N invocations)
- Early Warnings: Log warnings when context is 80% full
- Semantic Deduplication: Remove redundant information (e.g., repeated instructions)
Usage:
from google.adk.plugins import SmartContextFilterPlugin
from google.adk import App, Agent
app = App(
name="my_app",
root_agent=agent,
plugins=[
SmartContextFilterPlugin(
max_context_tokens=32000,
relevance_threshold=0.7,
preserve_tool_results=True
)
]
)Describe alternatives you've considered
-
Manual Context Management: Users manually manage context size:
- Requires constant monitoring
- Error-prone
- Doesn't scale
-
Fixed Invocation Count: Current approach of keeping N invocations:
- Doesn't account for token variance
- Can still exceed limits
- No relevance consideration
-
Post-Processing Filtering: Filter after context is built:
- Less efficient
- May remove context already sent to model
- Doesn't prevent hitting limits
Additional context
- Long-running conversations: Sessions that accumulate many turns
- Multi-agent systems: Shared context that needs optimization
- Cost-sensitive deployments: Need to maximize context efficiency
- Large tool responses: When tools return substantial data
Implementation Notes:
- Can extend existing
ContextFilterPluginor create new plugin - Requires token counting utility (can reuse from context cache manager)
- Embedding-based relevance requires embedding model (optional)
- Should integrate with event compaction for maximum efficiency
Related Code:
- Current implementation:
src/google/adk/plugins/context_filter_plugin.py - Token estimation:
src/google/adk/models/gemini_context_cache_manager.py:314(_estimate_request_tokens) - Event compaction:
src/google/adk/apps/compaction.py
Priority:
High - Significant cost savings potential and improves context management.
Metadata
Metadata
Assignees
Labels
core[Component] This issue is related to the core interface and implementation[Component] This issue is related to the core interface and implementation