Feature Request: Add Structured Tagging & Labeling Layer to AMB for Distillery / Model Training

## Summary

Add a structured tagging and labeling layer to AMB so it can serve not only as a runtime memory bridge, but also as a high-quality source of distillery, evaluation, and specialty-model training data.

The goal is to make AMB-generated memories and interaction traces more useful, auditable, filterable, and safe for downstream training workflows.

## Background

AMB is being defined as the **Agent Memory Bridge**: a runtime memory architecture that selects, retrieves, and bridges relevant context during agent interactions.

As we explore building specialized model layers on top of open-source/open-weight models, AMB can become one of the most important data sources for training. However, raw memory records or chat logs are not sufficient. We need structured metadata that describes what the data is, where it came from, how reliable it is, how fresh it is, and whether it can safely be used for training.

## Problem

Current AMB memory/context records may support basic memory bridging, but they likely do not yet support the richer labels needed for distillery-quality training data.

Without structured tags and labels, we risk:

- Mixing stable principles with temporary state
- Training on stale or superseded information
- Training on private or sensitive data by accident
- Losing feedback signals from user confirmations or corrections
- Making it hard to build eval datasets from real AMB traces
- Making model training noisy, unsafe, or difficult to audit

## Proposed Feature

Add a structured tagging and labeling system to AMB records and AMB interaction traces.

Example AMB record:

```json
{
  "memory_id": "amb_001",
  "content": "AMB should be treated as runtime memory bridge, not an authority/rules layer.",
  "type": "architecture_principle",
  "scope": "WandersCop",
  "source": "user_confirmed",
  "confidence": 0.95,
  "stability": "high",
  "freshness": "current",
  "privacy": "internal",
  "training_eligible": true,
  "training_use": ["architecture_reasoning", "retrieval_eval", "style_alignment"],
  "tags": ["AMB", "memory", "runtime", "distillery"],
  "created_at": "...",
  "last_confirmed_at": "...",
  "supersedes": []
}
```

## Suggested Label Categories

### 1. Type Labels

Examples:

- `architecture_principle`
- `project_state`
- `user_preference`
- `workflow_pattern`
- `tool_usage_trace`
- `decision_record`
- `correction`
- `temporary_context`

### 2. Source Labels

Examples:

- `user_confirmed`
- `internal_doc`
- `tool_result`
- `model_generated`
- `inferred`
- `external_source`

### 3. Stability / Freshness Labels

Examples:

- `static_principle`
- `slowly_changing`
- `dynamic_state`
- `temporary`
- `stale`
- `superseded`

### 4. Quality Labels

Examples:

- `accepted_by_user`
- `corrected_by_user`
- `rejected`
- `needs_review`
- `gold_sample`
- `uncertain`

### 5. Privacy / Training Eligibility Labels

Examples:

- `trainable`
- `eval_only`
- `retrieval_only`
- `do_not_train`
- `private_internal`
- `sensitive`
- `needs_anonymization`

### 6. Task Labels

Examples:

- `architecture_reasoning`
- `memory_update`
- `tool_routing`
- `internal_docs_qa`
- `code_planning`
- `customer_support`
- `product_strategy`

## Training Usage Rules

AMB should help determine how each record can be used downstream.

### Trainable

Stable and reusable patterns, such as:

- Architecture principles
- Repeated workflow patterns
- Preferred response structures
- Tool-routing patterns
- Generalized reasoning patterns

### Eval-only

Useful for testing behavior but not suitable for model-weight training.

Examples:

- Sensitive internal examples
- Real project scenarios
- Private customer-like workflows

### Retrieval-only

Should stay in AMB/state/database and not be absorbed into model weights.

Examples:

- Live project state
- Recent decisions
- Customer-specific information
- User-specific private context

### Do-not-train

Should not be used for model training.

Examples:

- Sensitive personal data
- Confidential data
- Temporary state
- Credentials or access information
- Legally restricted information

## Proposed AMB Direction

```text
AMB v1:
Runtime memory bridge

AMB v1.4 / v1.5:
Runtime memory bridge + dedicated state/memory separation

AMB v2:
Runtime memory bridge + labeled trace store + training-data refinery
```

## Why This Matters

This feature is foundational for a future distillery pipeline.

With structured AMB tags, we can later build:

- Better retrieval
- Better memory selection
- Training datasets
- Evaluation datasets
- Router models
- Verifier models
- Fine-tuning pipelines
- Safer internal model training
- Better distinction between stable knowledge and dynamic state

Long-term architecture:

```text
User interaction
  ↓
AMB runtime context selection
  ↓
Model / agent response
  ↓
User feedback or correction
  ↓
Labeled AMB trace
  ↓
Distillery pipeline
  ↓
Training / eval dataset
  ↓
Specialty model improvement
```

## Acceptance Criteria

- [ ] AMB memory records support structured metadata fields.
- [ ] AMB traces can be labeled by type, source, stability, freshness, quality, privacy, and training eligibility.
- [ ] Records can be marked as `trainable`, `eval_only`, `retrieval_only`, or `do_not_train`.
- [ ] AMB supports supersession or stale-state handling.
- [ ] User-confirmed corrections can update confidence and quality labels.
- [ ] Training/export pipelines can filter records based on eligibility labels.
- [ ] Sensitive/private records are excluded from training by default.
- [ ] Minimal schema is implemented first, with room to expand later.

## Minimal First Version

Start with a small schema:

```json
{
  "type": "architecture_principle",
  "source": "user_confirmed",
  "scope": "WandersCop",
  "stability": "high",
  "privacy": "internal",
  "training_eligible": true,
  "tags": ["AMB", "distillery"]
}
```

Then expand later into:

- Quality labels
- Feedback labels
- Supersession chains
- Freshness checks
- Anonymization rules
- Eval bucket assignment
- Distillery export filters

## Notes

This should be treated as a core AMB capability, not a side feature.

If AMB is going to become the foundation for future internal model training and distillery loops, then structured tagging and labeling needs to be designed early. Retrofitting labels later will make old traces messy, unsafe, and harder to trust.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add Structured Tagging & Labeling Layer to AMB for Distillery / Model Training #2

Summary

Background

Problem

Proposed Feature

Suggested Label Categories

1. Type Labels

2. Source Labels

3. Stability / Freshness Labels

4. Quality Labels

5. Privacy / Training Eligibility Labels

6. Task Labels

Training Usage Rules

Trainable

Eval-only

Retrieval-only

Do-not-train

Proposed AMB Direction

Why This Matters

Acceptance Criteria

Minimal First Version

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature Request: Add Structured Tagging & Labeling Layer to AMB for Distillery / Model Training #2

Description

Summary

Background

Problem

Proposed Feature

Suggested Label Categories

1. Type Labels

2. Source Labels

3. Stability / Freshness Labels

4. Quality Labels

5. Privacy / Training Eligibility Labels

6. Task Labels

Training Usage Rules

Trainable

Eval-only

Retrieval-only

Do-not-train

Proposed AMB Direction

Why This Matters

Acceptance Criteria

Minimal First Version

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions