AgentXchain v2.102.0

v2.102.0 ships a named benchmark workload catalog with topology-aware discovery and regression-triggering proof.

Released: 2026-04-15

Named Benchmark Workload Catalog

agentxchain benchmark now uses a named workload catalog instead of boolean mode flags. Each workload declares its own phase topology, expected governance signals, and proof expectations.

Built-in Workloads

Workload	Phases	Purpose
`baseline`	planning → implementation → qa	Standard governed run — happy path proof
`stress`	planning → implementation → qa	Admission control with validator rejection and retry
`completion-recovery`	planning → implementation → qa	Gate failure on missing artifact, reassignment, and recovery
`phase-drift`	planning → design → implementation → qa	Extra phase topology — triggers `REG-PHASE-ORDER-001` when diffed against baseline

Workload Selection

# Run a specific workload
agentxchain benchmark --workload completion-recovery

# Legacy alias still works
agentxchain benchmark --stress
# equivalent to: agentxchain benchmark --workload stress

# Conflicting flags fail closed
agentxchain benchmark --stress --workload baseline  # → error

Workload Discovery

New benchmark workloads subcommand exposes the full catalog with topology metadata:

# Human-readable listing
agentxchain benchmark workloads

# Structured catalog with phase order, expected signals, descriptions
agentxchain benchmark workloads --json

JSON output includes phase_order, phase_count, and expected governance signals for each workload — operators no longer need to read source code or scrape --help output to understand what each workload proves.

Phase-Drift Regression Proof

The phase-drift workload proves that the benchmark can trigger its own regression engine, not just demonstrate regression-free comparisons:

# Run both workloads with saved artifacts
agentxchain benchmark --workload baseline --output /tmp/bench-baseline
agentxchain benchmark --workload phase-drift --output /tmp/bench-drift

# Diff triggers real regression detection
agentxchain verify diff /tmp/bench-baseline/run-export.json /tmp/bench-drift/run-export.json
# → exit 1, has_regressions: true, REG-PHASE-ORDER-001

This is a fundamentally different diff shape: baseline has ["planning", "implementation", "qa"] while phase-drift has ["planning", "design", "implementation", "qa"].

Those saved benchmark artifacts are still repo-local run exports today, not coordinator exports. This release is proving workload and phase-regression behavior through verify diff; it is not quietly exercising coordinator repo-status logic.

Completion-Recovery Workload

Proves gate-failure recovery — a different failure class than validator rejection:

QA attempts completion
Gate check fails (missing .planning/ship-verdict.md)
QA is reassigned
Missing artifact is created
Completion succeeds through the real approval path

Durable Benchmark Artifacts

--output <dir> persists proof artifacts that round-trip through the public verification surface:

metrics.json — timing and governance signal counts
run-export.json — full run export (verifiable via agentxchain verify export)
verify-export.json — verification report
workload.json — workload metadata including topology

If operators later compare coordinator exports outside the benchmark flow, the same verify diff truth boundary still applies: summary.repo_run_statuses remains raw coordinator snapshot metadata, while coordinator repo-status changes and regressions come from authority-first child repo status when nested child exports are readable.

Topology-Aware Config Generation

Benchmark config generation now resolves from workload-declared phase specs instead of mutating a hardcoded base config. The makeConfig() builder derives roles, runtimes, routing, and gates from the workload's phase_order and custom_phases declarations. This is extensible to arbitrary topologies without per-workload branches in the command body.

Admission Control Cleanup

Pre-run admission control now rejects dead-end governed configs (configurations where no agent can make progress) before starting a run, saving operator time. The legacy collectRemoteReviewOnlyGateWarnings helper was removed — its validation responsibility is now handled cleanly by the admission control system.

Evidence

4675 tests / 1000 suites / 0 failures
20 benchmark tests (AT-BENCH-001 through AT-BENCH-020)
7 docs guard tests
Docusaurus build: success

Named Benchmark Workload Catalog​

Built-in Workloads​

Workload Selection​

Workload Discovery​

Phase-Drift Regression Proof​

Completion-Recovery Workload​

Durable Benchmark Artifacts​

Topology-Aware Config Generation​

Admission Control Cleanup​

Evidence​