AgentXchain v2.102.0
v2.102.0 ships a named benchmark workload catalog with topology-aware discovery and regression-triggering proof.
Released: 2026-04-15
Named Benchmark Workload Catalog
agentxchain benchmark now uses a named workload catalog instead of boolean mode flags. Each workload declares its own phase topology, expected governance signals, and proof expectations.
Built-in Workloads
| Workload | Phases | Purpose |
|---|---|---|
baseline | planning → implementation → qa | Standard governed run — happy path proof |
stress | planning → implementation → qa | Admission control with validator rejection and retry |
completion-recovery | planning → implementation → qa | Gate failure on missing artifact, reassignment, and recovery |
phase-drift | planning → design → implementation → qa | Extra phase topology — triggers REG-PHASE-ORDER-001 when diffed against baseline |
Workload Selection
# Run a specific workload
agentxchain benchmark --workload completion-recovery
# Legacy alias still works
agentxchain benchmark --stress
# equivalent to: agentxchain benchmark --workload stress
# Conflicting flags fail closed
agentxchain benchmark --stress --workload baseline # → error
Workload Discovery
New benchmark workloads subcommand exposes the full catalog with topology metadata:
# Human-readable listing
agentxchain benchmark workloads
# Structured catalog with phase order, expected signals, descriptions
agentxchain benchmark workloads --json
JSON output includes phase_order, phase_count, and expected governance signals for each workload — operators no longer need to read source code or scrape --help output to understand what each workload proves.
Phase-Drift Regression Proof
The phase-drift workload proves that the benchmark can trigger its own regression engine, not just demonstrate regression-free comparisons:
# Run both workloads with saved artifacts
agentxchain benchmark --workload baseline --output /tmp/bench-baseline
agentxchain benchmark --workload phase-drift --output /tmp/bench-drift
# Diff triggers real regression detection
agentxchain verify diff /tmp/bench-baseline/run-export.json /tmp/bench-drift/run-export.json
# → exit 1, has_regressions: true, REG-PHASE-ORDER-001
This is a fundamentally different diff shape: baseline has ["planning", "implementation", "qa"] while phase-drift has ["planning", "design", "implementation", "qa"].
Those saved benchmark artifacts are still repo-local run exports today, not coordinator exports. This release is proving workload and phase-regression behavior through verify diff; it is not quietly exercising coordinator repo-status logic.
Completion-Recovery Workload
Proves gate-failure recovery — a different failure class than validator rejection:
- QA attempts completion
- Gate check fails (missing
.planning/ship-verdict.md) - QA is reassigned
- Missing artifact is created
- Completion succeeds through the real approval path
Durable Benchmark Artifacts
--output <dir> persists proof artifacts that round-trip through the public verification surface:
metrics.json— timing and governance signal countsrun-export.json— full run export (verifiable viaagentxchain verify export)verify-export.json— verification reportworkload.json— workload metadata including topology
If operators later compare coordinator exports outside the benchmark flow, the same verify diff truth boundary still applies: summary.repo_run_statuses remains raw coordinator snapshot metadata, while coordinator repo-status changes and regressions come from authority-first child repo status when nested child exports are readable.
Topology-Aware Config Generation
Benchmark config generation now resolves from workload-declared phase specs instead of mutating a hardcoded base config. The makeConfig() builder derives roles, runtimes, routing, and gates from the workload's phase_order and custom_phases declarations. This is extensible to arbitrary topologies without per-workload branches in the command body.
Admission Control Cleanup
Pre-run admission control now rejects dead-end governed configs (configurations where no agent can make progress) before starting a run, saving operator time. The legacy collectRemoteReviewOnlyGateWarnings helper was removed — its validation responsibility is now handled cleanly by the admission control system.
Evidence
- 4675 tests / 1000 suites / 0 failures
- 20 benchmark tests (AT-BENCH-001 through AT-BENCH-020)
- 7 docs guard tests
- Docusaurus build: success