Lights-Out Operation
This is the repo-local operator runbook for unattended governed execution.
If you want the scheduler reference, read Lights-Out Scheduling. If you want the end-to-end path from zero to a running continuous session, start here.
This guide is repo-local only. It assumes one governed repository with agentxchain.json and one project-owned .planning/VISION.md. If you are coordinating multiple repositories, use agentxchain multi and the Multi-Repo Coordination docs instead of trying to stretch this flow across child repos.
Choose the right loop owner
AgentXchain ships two related lights-out surfaces:
agentxchain run --continuous --vision <path>: one process owns the loop directlyagentxchain schedule daemon: the daemon owns cadence and advances schedule-owned continuous sessions
Use run --continuous when you want a foreground proof run or a single process in a tmux/session manager. Use schedule daemon when you want repo-local polling, daemon health, and schedule-owned session continuity.
1. Preflight the repo before turning anything on
Do not daemonize a repo you have not validated. The minimum preflight is:
agentxchain doctor
agentxchain connector check
agentxchain status
Confirm these inputs before you go further:
.planning/VISION.mdexists and reflects the product direction for this repo- your runtime binding is real for the roles that will execute unattended work
- approval policy is intentional: use auto-approval only where your governance rules allow it
- routine gates that may close unattended are marked
credentialed: false; gates protecting publish, deploy, payment, credential, or other external irreversible actions are markedcredentialed: true - budget stance is explicit for both per-run and continuous session spend
If you need human review before vision-derived work enters execution, set triage_approval: "human" in the continuous config instead of pretending the repo is safe for full auto-approval.
Full-auto is a policy posture, not a magic mode flag. A generated governed config can close routine gates by policy after evidence passes, but credentialed gates still require human approval. For the exact config shape, read Approval Policy.
2. Configure one schedule-owned continuous session
Add a schedule entry in agentxchain.json:
{
"schedules": {
"vision_autopilot": {
"enabled": true,
"every_minutes": 60,
"auto_approve": false,
"max_turns": 10,
"initial_role": "pm",
"trigger_reason": "Repo-local lights-out execution",
"continuous": {
"enabled": true,
"vision_path": ".planning/VISION.md",
"max_runs": 50,
"max_idle_cycles": 5,
"on_idle": "perpetual",
"triage_approval": "auto",
"per_session_max_usd": 25.0,
"idle_expansion": {
"sources": [".planning/VISION.md", ".planning/ROADMAP.md", ".planning/SYSTEM_SPEC.md"],
"max_expansions": 5,
"role": "pm",
"malformed_retry_limit": 1
},
"auto_retry_on_ghost": {
"enabled": true,
"max_retries_per_run": 3,
"cooldown_seconds": 5
}
}
}
}
}
What matters here:
every_minutescontrols when the daemon may start or restart the sessioncontinuous.vision_pathis required when continuous mode is enabledcontinuous.on_idlechooses what happens aftermax_idle_cycleswith no derivable workper_session_max_usdis the cumulative session cap; when hit, the session stops cleanly instead of blockingtriage_approval: "human"pauses vision-derived work at the intake approval boundary instead of auto-approving itauto_approve: falsekeeps the scheduler from using the blanket CLI override; project-ownedapproval_policyshould decide which gates are routine enough to close unattendedauto_retry_on_ghostlets continuous mode reissue transient startup ghost turns before pausing the session
For the field-by-field reference, see Lights-Out Scheduling and CLI Reference.
2a. Choose the idle policy
Continuous mode separates "nothing derivable from the current queue scan" from "the product vision is exhausted." Use on_idle to make that policy explicit:
| Mode | Behavior | Use it when |
|---|---|---|
exit | Stop with idle_exit after max_idle_cycles empty scans | You want a bounded proof run or a finite delivery batch |
perpetual | Dispatch a PM idle-expansion turn after the idle threshold | You want full-auto product development to derive the next increment from broader repo context |
human_review | Pause the session with idle_human_review_required | You want an operator to decide whether to stop, inject work, or switch to perpetual mode |
perpetual mode uses the normal intake pipeline. The loop records a vision_idle_expansion signal, assigns the configured role (default pm), and requires a structured idle_expansion_result. A successful PM result either creates a new intake intent (kind: "new_intake_intent") or stops cleanly as vision_exhausted.
The PM idle-expansion prompt scaffold is .agentxchain/prompts/pm-idle-expansion.md. It is committed into new governed projects by agentxchain init --governed so teams can review the contract. The runtime's first implementation carries the same requirements through the synthesized charter: VISION.md is read-only, proposed work must cite a human-owned vision heading, and ROADMAP.md / SYSTEM_SPEC.md may be updated only as supporting evidence.
idle_expansion.max_expansions is a hard loop guard. If the PM expansion path cannot produce productive work for the configured number of attempts, the session stops as vision_expansion_exhausted instead of spending forever. per_session_max_usd is checked before idle expansion, so budget exhaustion wins over all idle policies.
3. Prove the loop once before starting the daemon
Run one bounded continuous session in the foreground first:
agentxchain run --continuous --vision .planning/VISION.md --max-runs 1 --session-budget 5.00
That command is the safety floor. It proves:
- the vision file resolves correctly
- the intake lifecycle can seed or consume work
- the configured adapter/runtime can complete a governed run
- status surfaces continuous-session truth for this repo
- routine human-approval gates close through
approval_policyonly after required files and verification pass evidence are present
Skipping this proof and going straight to schedule daemon is lazy. If the repo cannot complete one bounded run in the foreground, the daemon will not save you.
3a. Use a truthful mixed-runtime proof shape
Do not fake a “fully lights-out” proof by binding every phase to remote review_only roles and then acting surprised when requires_files gates never pass.
The production-valid mixed-runtime shape is:
- local authoring roles satisfy repo-local gate files
- a
review_onlyapi_proxyQA role validates and requests completion - the repo already contains the final QA gate files before the QA review turn begins
That is the real contract today. A review_only api_proxy QA role can validate and request completion, but it cannot create gate files.
If you want the repo-owned live proof harness for this exact shape, run:
node examples/live-governed-proof/run-continuous-mixed-proof.mjs --json --output examples/live-governed-proof/evidence/continuous-mixed-proof.latest.json
That harness drives the real run --continuous CLI surface, uses a real Anthropic-backed api_proxy QA turn, and validates the continuous session, intake provenance, review artifact, and recorded spend. The checked-in evidence artifact at continuous-mixed-proof.latest.json backs this claim.
For perpetual mode with --on-idle perpetual, see the tusq.dev Dogfood Proof — three full governed runs completed autonomously on a real production codebase with PM idle-expansion chaining runs and zero reliability failures.
4. Start the daemon
Once the bounded proof run is clean, start the scheduler:
agentxchain schedule daemon --poll-seconds 60
Or for a persistent shell session:
tmux new-session -d -s agentxchain-daemon 'agentxchain schedule daemon --poll-seconds 60'
What happens next:
- When the schedule becomes due, the daemon starts a schedule-owned continuous session.
- Later polls advance the same session even if the schedule is not due again yet.
- The daemon checks queued work first.
- If nothing is queued, it seeds work from
.planning/VISION.md. - If the queue stays empty through the idle threshold,
on_idledecides whether the loop exits, pauses for human review, or dispatches PM idle-expansion. - Each run still goes through the real intake lifecycle: plan → start → governed run → resolve.
This is why schedule daemon is the correct unattended owner for schedule-driven operation: it owns heartbeat, status, and later-poll continuation without inventing a second scheduler.
5. Observe the system while it runs
Use these commands while the daemon is active:
agentxchain status
agentxchain status --json
agentxchain schedule status
agentxchain schedule list
agentxchain events --follow
Watch for:
- active
continuous_sessionstate owner_type: "schedule"and the owning schedule idruns_completedversusmax_runscurrent_vision_objectivecontinuous_blocked,continuous_running,continuous_completed,continuous_vision_exhausted,continuous_vision_expansion_exhausted, orcontinuous_failed
agentxchain events --follow is also the auto-chain audit trail. On every clean hand-off from one completed run to the next, you should see session_continuation <previous_run_id> -> <next_run_id> (<objective>). If a run completed cleanly and there is no continuation event, treat that as a defect, not as normal operator ambiguity.
paused is reserved for real blockers like unresolved escalations, blocked runs, or explicit on_idle: "human_review". A healthy post-completion path should stay running while it seeds the next objective, then end as completed, idle_exit, vision_exhausted, or vision_expansion_exhausted when it hits max_runs, bounded idle policy, PM exhaustion, PM expansion cap, or budget.
agentxchain status is the run/session truth surface. agentxchain schedule status is the daemon heartbeat surface. Do not confuse them.
5a. Bound startup ghost recovery
Continuous mode can automatically recover from transient startup ghosts: turns that reached failed_start because the runtime never produced startup proof. Enable it explicitly:
{
"run_loop": {
"continuous": {
"auto_retry_on_ghost": {
"enabled": true,
"max_retries_per_run": 3,
"cooldown_seconds": 5
}
}
}
}
When enabled, the loop uses the same reissueTurn() path as the manual agentxchain reissue-turn --turn <id> --reason ghost recovery, records attempts in continuous-session.json, emits auto_retried_ghost, and continues the same run. If the run hits max_retries_per_run, it emits ghost_retry_exhausted, keeps the session paused, and preserves the manual reissue-turn command in status.
Auto-retry is enabled by default only for the strict full-auto approval posture: approval_policy.phase_transitions.default === "auto_approve" and approval_policy.run_completion.action === "auto_approve" with continuous mode enabled. Generated BUG-59 safe-rule configs use phase_transitions.default: "require_human" with explicit auto-approve rules. That conservative posture does not auto-enable ghost retry. For those projects, opt in with the config above (run_loop.continuous.auto_retry_on_ghost.enabled: true) or use agentxchain run --continuous --auto-retry-on-ghost.
5b. Reconcile intentional operator commits
Manual recovery sometimes requires a human commit on top of the latest checkpoint. That commit should not force state surgery. If status shows Git HEAD has moved since checkpoint, run:
agentxchain reconcile-state --accept-operator-head
The command accepts only fast-forward operator commits that leave .agentxchain/ alone. It updates the governed baseline and emits state_reconciled_operator_commits. If it refuses, treat the refusal as real safety signal: history rewrites and governed-state edits are not safe lights-out inputs.
Under full-auto approval policy, the continuous loop already runs this reconcile before every dispatch — you do not need to run the command manually for safe fast-forward operator commits. The effective default for run_loop.continuous.reconcile_operator_commits is auto_safe_only when approval policy is full-auto; explicit config and the CLI flag always win.
{
"run_loop": {
"continuous": {
"reconcile_operator_commits": "auto_safe_only"
}
}
}
Valid modes:
manual: continuous drift stays blocking; operator runsagentxchain reconcile-state --accept-operator-headby hand. Default under governed-mode approval policies.auto_safe_only: continuous loop auto-accepts fast-forward operator commits that leave.agentxchain/alone before every dispatch. Default under full-auto approval policy. Unsafe commits still pause the session withoperator_commit_reconcile_refusedand the same refusal class the shared reconcile primitive returns (governance_state_modified,critical_artifact_deleted,history_rewrite,missing_baseline,git_unavailable,not_git_repo, orcommit_walk_failed).disabled: continuous loop skips the reconcile entirely. Use only if an external system owns baseline advancement.
Per-run override: agentxchain run --continuous --reconcile-operator-commits auto_safe_only. When the auto reconcile refuses a commit, the session pauses, status surfaces the refusal class and the same agentxchain reconcile-state --accept-operator-head recovery hint, and the loop emits operator_commit_reconcile_refused for audit. Manual and automatic reconcile both route through the same audited safety primitive; the auto path adds a blocked-state/event wrapper instead of reimplementing commit-range checks.
6. Handle blocked and failed states correctly
When the session pauses for a real blocker:
agentxchain status
Then run the exact recovery command surfaced by the governed state. Examples:
agentxchain unblock <id> # needs_human escalation
agentxchain reissue-turn --turn <id> --reason ghost
agentxchain reissue-turn --turn <id> --reason stale
Rules that matter:
- blocked continuous sessions stay paused, not fake-completed
- after the surfaced recovery action clears the blocker, the next daemon poll resumes the same session
- session-budget exhaustion is a terminal stop, not a blocker; start a new session if you want to continue
- non-blocked executor failure leaves the session
failedfor inspection instead of pretending the intake intent completed
If you need the full recovery matrix, use Recovery. This page keeps the operational path short; the recovery page is the canonical blocked-state map.
7. Inject explicit human priority when the vision queue is wrong
Vision seeding is fallback discovery, not authority. Human priority wins.
Inject urgent work like this:
agentxchain inject "Fix the broken release-note sidebar ordering" --priority p0
Or stage lower-priority work without approval:
agentxchain inject "Investigate docs stack migration options" --priority p2 --no-approve
Important behavior:
- injected
p0work preempts new vision seeding - the continuous loop yields with
priority_preemptedand the daemon consumes the injected item first - non-
p0items join the intake queue in normal priority order
For the intake boundary and queue semantics, see Continuous Delivery Intake.
8. Know how the loop stops
Continuous mode stops for specific reasons:
max_runsreachedmax_idle_cyclesreached because no queued or vision-derived work was foundper_session_max_usdor--session-budgetreached- blocked human escalation that you have not resolved yet
- operator
SIGINT
SIGINT behavior is intentional:
- first
Ctrl+C: finish the current in-flight work, then stop - second
Ctrl+C: hard-abort
For daemon-owned operation, stop the daemon the same way you stop any foreground process or session-manager job.
9. Minimum operating discipline
If you want unattended execution without lying to yourself, keep this floor:
- Validate the repo with
doctorandconnector check. - Run one bounded
run --continuous --max-runs 1proof first. - Set a real session budget.
- Watch
status,schedule status, andevents --follow. - Use
inject --priority p0when human judgment needs to override vision seeding. - Use
unblockto resolve true blockers; do not hack state files.
That is the stable repo-local lights-out story today.