Smart Context Pruning with Token-Aware Filtering

**Is your feature request related to a problem? Please describe.**
Yes. The existing `ContextFilterPlugin` only keeps N invocations but doesn't consider:

1. **Actual Token Count**: It only counts invocations, not actual tokens. A single invocation with a large tool response could exceed context limits
2. **Relevance**: It removes important context while keeping less relevant content based only on recency
3. **Context Window Pressure**: No proactive management before hitting limits - it only filters after the fact
4. **No Token Awareness**: Can't optimize for token efficiency when context is getting full
```
# ContextFilterPlugin keeps last 5 invocations
# But one invocation might have a 10k token tool response
# While another has just 100 tokens
# Result: Context exceeds limits even with only 5 invocations
```
**Describe the solution you'd like**
Enhance `ContextFilterPlugin` with token-aware and relevance-based filtering:
```python
class SmartContextFilterPlugin(BasePlugin):
    """Intelligent context filtering based on tokens and relevance."""
    
    max_context_tokens: int = 32000
    relevance_threshold: float = 0.7
    preserve_tool_results: bool = True
    preserve_user_corrections: bool = True
    use_embeddings: bool = True  # For relevance scoring
    warn_at_percent: float = 0.8  # Warn when 80% full
```
1. **Token Counting**: Monitor actual token count of context before each LLM call
2. **Proactive Pruning**: When approaching `max_context_tokens`, remove least relevant events first
3. **Relevance Scoring**: Use embeddings to score relevance of each event to current query
4. **Priority Preservation**: Always preserve:
   - Tool results (critical for agent reasoning)
   - User corrections (important feedback)
   - Recent events (within last N invocations)
5. **Early Warnings**: Log warnings when context is 80% full
6. **Semantic Deduplication**: Remove redundant information (e.g., repeated instructions)

### Usage:
```python
from google.adk.plugins import SmartContextFilterPlugin
from google.adk import App, Agent

app = App(
    name="my_app",
    root_agent=agent,
    plugins=[
        SmartContextFilterPlugin(
            max_context_tokens=32000,
            relevance_threshold=0.7,
            preserve_tool_results=True
        )
    ]
)
```
**Describe alternatives you've considered**
1. **Manual Context Management**: Users manually manage context size:
   - Requires constant monitoring
   - Error-prone
   - Doesn't scale

2. **Fixed Invocation Count**: Current approach of keeping N invocations:
   - Doesn't account for token variance
   - Can still exceed limits
   - No relevance consideration

3. **Post-Processing Filtering**: Filter after context is built:
   - Less efficient
   - May remove context already sent to model
   - Doesn't prevent hitting limits
**Additional context**

- **Long-running conversations**: Sessions that accumulate many turns
- **Multi-agent systems**: Shared context that needs optimization
- **Cost-sensitive deployments**: Need to maximize context efficiency
- **Large tool responses**: When tools return substantial data

### Implementation Notes:
- Can extend existing `ContextFilterPlugin` or create new plugin
- Requires token counting utility (can reuse from context cache manager)
- Embedding-based relevance requires embedding model (optional)
- Should integrate with event compaction for maximum efficiency

### Related Code:
- Current implementation: `src/google/adk/plugins/context_filter_plugin.py`
- Token estimation: `src/google/adk/models/gemini_context_cache_manager.py:314` (`_estimate_request_tokens`)
- Event compaction: `src/google/adk/apps/compaction.py`

### Priority:
**High** - Significant cost savings potential and improves context management.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Smart Context Pruning with Token-Aware Filtering #3829

Usage:

Implementation Notes:

Related Code:

Priority:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Smart Context Pruning with Token-Aware Filtering #3829

Description

Usage:

Implementation Notes:

Related Code:

Priority:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions