[ab-advisor] Improve experiment infrastructure: schema, reporting & audit

### 🔬 Experiment Infrastructure Improvements

Sub-issue of #38825 | Triggered by `ab-testing-advisor` on 2026-06-12

---

### Area 1: Frontmatter Schema — `notify` Alert-Posting Gap

The `field-presence-checker` confirms that `analysis_type` and `tags` are **fully implemented** end-to-end (parsed in Go → marshalled into `GH_AW_EXPERIMENT_SPEC` → rendered in JS step summary). However, `notify` is **partially implemented**: the field is parsed in Go (`compiler_experiments.go` lines 201–214) and read in the JS picker to build `notifyTargets` (`pick_experiment.cjs` lines 242–246), but those targets are **only displayed as text in the step summary** — no code posts alerts to the referenced discussion or issue when an experiment concludes.

**Proposed fix in `actions/setup/js/pick_experiment.cjs`:**

```javascript
// After step summary is written, check if experiment has concluded
const totalSamples = Object.values(variantCounts).reduce((a, b) => a + b, 0);
const minSamplesReached = Object.values(variantCounts)
  .every(count => count >= (cfg.min_samples ?? 0));

if (minSamplesReached && notifyTargets.length > 0) {
  const summary = buildResultsSummary(variantCounts, cfg); // winner, effect size, p-value
  for (const target of notifyTargets) {
    if (target.type === 'discussion') {
      // POST /repos/{owner}/{repo}/discussions/{id}/comments
      await octokit.request('POST /repos/{owner}/{repo}/discussions/{discussion_number}/comments', {
        discussion_number: target.id,
        body: summary
      });
    } else if (target.type === 'issue') {
      await octokit.rest.issues.createComment({ issue_number: target.id, body: summary });
    }
  }
}
```

This requires `pick_experiment.cjs` to:
1. Track `variantCounts` across runs (already available via `state.json`)
2. Build a results summary when `min_samples` is reached for all variants
3. Post the comment via GitHub REST API — using the existing Octokit instance if available, or via a safe-output tool

### Area 2: Reporting & Dashboards

<details><summary>Proposed daily experiment report pipeline</summary>

The existing `daily-experiment-report` workflow can be enhanced to provide a full analytics pipeline:

**Step 1 — Aggregate run artifacts**
```bash
gh run list --workflow="*.lock.yml" --json databaseId --limit 200 | \
  jq -r '.[].databaseId' | xargs -I{} \
  gh run download {} --name experiments-state --dir /tmp/experiments/{} 2>/dev/null || true
```

**Step 2 — Compute running statistics per variant**
For each experiment found across all `state.json` files:
- Group samples by `variant`
- Compute: `mean`, `variance`, `sample_count` for `metric` and each `secondary_metric`

**Step 3 — Apply significance test based on `analysis_type`**
| `analysis_type` | Test applied |
|---|---|
| `t_test` | Welch's t-test on metric means |
| `mann_whitney` | Mann-Whitney U on metric distributions |
| `proportion_test` | Two-proportion z-test (binary outcomes) |
| `bayesian_ab` | Beta-binomial posterior P(B > A) |

Significance threshold: α = 0.05; post conclusion comment when reached.

**Step 4 — ASCII table in step summary**
```
experiment: prompt_style (daily-cli-performance)
variant     n    regression_accuracy    ai_credits    winner?
detailed    14   94.3% ± 3.1%          8 420 ± 620   (baseline)
concise     11   91.8% ± 4.2%          6 210 ± 510   ✓ p=0.031
```

**Step 5 — Discussion post when significant**
Post to the discussion referenced in `experiments.<name>.notify.discussion` once `min_samples` is reached and p-value < α.

</details>

### Area 3: Audit & OTEL Integration

<details><summary>Proposed experiment observability changes</summary>

**OTEL resource attributes** (add to `pick_experiment.cjs` immediately after variant assignment):
```javascript
const existingAttrs = process.env.OTEL_RESOURCE_ATTRIBUTES ?? '';
const experimentAttrs = [
  `experiment.name=${experimentName}`,
  `experiment.variant=${chosenVariant}`,
  `experiment.run_index=${runIndex}`
].join(',');
core.exportVariable('OTEL_RESOURCE_ATTRIBUTES',
  existingAttrs ? `${existingAttrs},${experimentAttrs}` : experimentAttrs);
```
This causes all downstream spans in the run to carry `experiment.name` and `experiment.variant`, enabling Honeycomb/Jaeger slice-and-dice by experiment assignment without any workflow changes.

**`gh aw audit` output enrichment**: Surface experiment assignment as a structured block:
```json
{
  "experiment": {
    "name": "prompt_style",
    "variant": "concise",
    "run_index": 11,
    "min_samples": 20,
    "progress": "55%",
    "assigned_at": "2026-06-12T11:38:00Z"
  }
}
```

**Step summary progress bar** (already partially done in `pick_experiment.cjs`): Add `run_index / min_samples` ratio and estimated days to conclusion:
```
| Progress | 11 / 20 runs (55%) — est. 3–4 weeks to conclusion |
```

**Audit log filtering**: Enable `gh aw audit --experiment prompt_style --variant concise` to return only runs matching that assignment, making it easy to compare failure modes between variants.

</details>

### Implementation Steps

- [ ] Implement `notify` alert-posting in `pick_experiment.cjs` when `minSamplesReached && notifyTargets.length > 0`
- [ ] Add `experiment.name` + `experiment.variant` + `experiment.run_index` as OTEL resource attributes in `pick_experiment.cjs`
- [ ] Add experiment assignment JSON block to `gh aw audit` output
- [ ] Add `run_index / min_samples` progress bar and estimated-days-to-conclusion to step summary
- [ ] Enhance `daily-experiment-report` to aggregate artifacts, compute statistics, apply `analysis_type` test, and post significance results
- [ ] Add `--experiment` / `--variant` filter flags to `gh aw audit`

### References

- Compiler: `pkg/workflow/compiler_experiments.go`
- Picker: `actions/setup/js/pick_experiment.cjs`
- Report workflow: `.github/workflows/daily-experiment-report.md`
- Campaign issue: #38825
Related to #38825







> Generated by [🧪 Daily A/B Testing Advisor](https://github.com/github/gh-aw/actions/runs/27413173835) · 395.6 AIC · ⌖ 21.6 AIC · ⊞ 22.4K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fab-testing-advisor%22&type=issues)
> - [x] expires  on Jun 26, 2026, 3:46 AM UTC-08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ab-advisor] Improve experiment infrastructure: schema, reporting & audit #38826

🔬 Experiment Infrastructure Improvements

Area 1: Frontmatter Schema — `notify` Alert-Posting Gap

Area 2: Reporting & Dashboards

Area 3: Audit & OTEL Integration

Implementation Steps

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

`analysis_type`	Test applied
`t_test`	Welch's t-test on metric means
`mann_whitney`	Mann-Whitney U on metric distributions
`proportion_test`	Two-proportion z-test (binary outcomes)
`bayesian_ab`	Beta-binomial posterior P(B > A)

[ab-advisor] Improve experiment infrastructure: schema, reporting & audit #38826

Description

🔬 Experiment Infrastructure Improvements

Area 1: Frontmatter Schema — notify Alert-Posting Gap

Area 2: Reporting & Dashboards

Area 3: Audit & OTEL Integration

Implementation Steps

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Area 1: Frontmatter Schema — `notify` Alert-Posting Gap