Skip to main content

Lights-Out Operation

This is the repo-local operator runbook for unattended governed execution.

If you want the scheduler reference, read Lights-Out Scheduling. If you want the end-to-end path from zero to a running continuous session, start here.

This guide is repo-local only. It assumes one governed repository with agentxchain.json and one project-owned .planning/VISION.md. If you are coordinating multiple repositories, use agentxchain multi and the Multi-Repo Coordination docs instead of trying to stretch this flow across child repos.

Choose the right loop owner

AgentXchain ships two related lights-out surfaces:

  • agentxchain run --continuous --vision <path>: one process owns the loop directly
  • agentxchain schedule daemon: the daemon owns cadence and advances schedule-owned continuous sessions

Use run --continuous when you want a foreground proof run or a single process in a tmux/session manager. Use schedule daemon when you want repo-local polling, daemon health, and schedule-owned session continuity.

1. Preflight the repo before turning anything on

Do not daemonize a repo you have not validated. The minimum preflight is:

agentxchain doctor
agentxchain connector check
agentxchain status

Confirm these inputs before you go further:

  • .planning/VISION.md exists and reflects the product direction for this repo
  • your runtime binding is real for the roles that will execute unattended work
  • approval policy is intentional: use auto-approval only where your governance rules allow it
  • routine gates that may close unattended are marked credentialed: false; gates protecting publish, deploy, payment, credential, or other external irreversible actions are marked credentialed: true
  • budget stance is explicit for both per-run and continuous session spend

If you need human review before vision-derived work enters execution, set triage_approval: "human" in the continuous config instead of pretending the repo is safe for full auto-approval.

Full-auto is a policy posture, not a magic mode flag. A generated governed config can close routine gates by policy after evidence passes, but credentialed gates still require human approval. For the exact config shape, read Approval Policy.

2. Configure one schedule-owned continuous session

Add a schedule entry in agentxchain.json:

{
"schedules": {
"vision_autopilot": {
"enabled": true,
"every_minutes": 60,
"auto_approve": false,
"max_turns": 10,
"initial_role": "pm",
"trigger_reason": "Repo-local lights-out execution",
"continuous": {
"enabled": true,
"vision_path": ".planning/VISION.md",
"max_runs": 50,
"max_idle_cycles": 5,
"on_idle": "perpetual",
"triage_approval": "auto",
"per_session_max_usd": 25.0,
"idle_expansion": {
"sources": [".planning/VISION.md", ".planning/ROADMAP.md", ".planning/SYSTEM_SPEC.md"],
"max_expansions": 5,
"role": "pm",
"malformed_retry_limit": 1
},
"auto_retry_on_ghost": {
"enabled": true,
"max_retries_per_run": 3,
"cooldown_seconds": 5
}
}
}
}
}

What matters here:

  • every_minutes controls when the daemon may start or restart the session
  • continuous.vision_path is required when continuous mode is enabled
  • continuous.on_idle chooses what happens after max_idle_cycles with no derivable work
  • per_session_max_usd is the cumulative session cap; when hit, the session stops cleanly instead of blocking
  • triage_approval: "human" pauses vision-derived work at the intake approval boundary instead of auto-approving it
  • auto_approve: false keeps the scheduler from using the blanket CLI override; project-owned approval_policy should decide which gates are routine enough to close unattended
  • auto_retry_on_ghost lets continuous mode reissue transient startup ghost turns before pausing the session

For the field-by-field reference, see Lights-Out Scheduling and CLI Reference.

2a. Choose the idle policy

Continuous mode separates "nothing derivable from the current queue scan" from "the product vision is exhausted." Use on_idle to make that policy explicit:

ModeBehaviorUse it when
exitStop with idle_exit after max_idle_cycles empty scansYou want a bounded proof run or a finite delivery batch
perpetualDispatch a PM idle-expansion turn after the idle thresholdYou want full-auto product development to derive the next increment from broader repo context
human_reviewPause the session with idle_human_review_requiredYou want an operator to decide whether to stop, inject work, or switch to perpetual mode

perpetual mode uses the normal intake pipeline. The loop records a vision_idle_expansion signal, assigns the configured role (default pm), and requires a structured idle_expansion_result. A successful PM result either creates a new intake intent (kind: "new_intake_intent") or stops cleanly as vision_exhausted.

The PM idle-expansion prompt scaffold is .agentxchain/prompts/pm-idle-expansion.md. It is committed into new governed projects by agentxchain init --governed so teams can review the contract. The runtime's first implementation carries the same requirements through the synthesized charter: VISION.md is read-only, proposed work must cite a human-owned vision heading, and ROADMAP.md / SYSTEM_SPEC.md may be updated only as supporting evidence.

idle_expansion.max_expansions is a hard loop guard. If the PM expansion path cannot produce productive work for the configured number of attempts, the session stops as vision_expansion_exhausted instead of spending forever. per_session_max_usd is checked before idle expansion, so budget exhaustion wins over all idle policies.

3. Prove the loop once before starting the daemon

Run one bounded continuous session in the foreground first:

agentxchain run --continuous --vision .planning/VISION.md --max-runs 1 --session-budget 5.00

That command is the safety floor. It proves:

  • the vision file resolves correctly
  • the intake lifecycle can seed or consume work
  • the configured adapter/runtime can complete a governed run
  • status surfaces continuous-session truth for this repo
  • routine human-approval gates close through approval_policy only after required files and verification pass evidence are present

Skipping this proof and going straight to schedule daemon is lazy. If the repo cannot complete one bounded run in the foreground, the daemon will not save you.

3a. Use a truthful mixed-runtime proof shape

Do not fake a “fully lights-out” proof by binding every phase to remote review_only roles and then acting surprised when requires_files gates never pass.

The production-valid mixed-runtime shape is:

  • local authoring roles satisfy repo-local gate files
  • a review_only api_proxy QA role validates and requests completion
  • the repo already contains the final QA gate files before the QA review turn begins

That is the real contract today. A review_only api_proxy QA role can validate and request completion, but it cannot create gate files.

If you want the repo-owned live proof harness for this exact shape, run:

node examples/live-governed-proof/run-continuous-mixed-proof.mjs --json --output examples/live-governed-proof/evidence/continuous-mixed-proof.latest.json

That harness drives the real run --continuous CLI surface, uses a real Anthropic-backed api_proxy QA turn, and validates the continuous session, intake provenance, review artifact, and recorded spend. The checked-in evidence artifact at continuous-mixed-proof.latest.json backs this claim.

For perpetual mode with --on-idle perpetual, see the tusq.dev Dogfood Proof — three full governed runs completed autonomously on a real production codebase with PM idle-expansion chaining runs and zero reliability failures.

4. Start the daemon

Once the bounded proof run is clean, start the scheduler:

agentxchain schedule daemon --poll-seconds 60

Or for a persistent shell session:

tmux new-session -d -s agentxchain-daemon 'agentxchain schedule daemon --poll-seconds 60'

What happens next:

  1. When the schedule becomes due, the daemon starts a schedule-owned continuous session.
  2. Later polls advance the same session even if the schedule is not due again yet.
  3. The daemon checks queued work first.
  4. If nothing is queued, it seeds work from .planning/VISION.md.
  5. If the queue stays empty through the idle threshold, on_idle decides whether the loop exits, pauses for human review, or dispatches PM idle-expansion.
  6. Each run still goes through the real intake lifecycle: plan → start → governed run → resolve.

This is why schedule daemon is the correct unattended owner for schedule-driven operation: it owns heartbeat, status, and later-poll continuation without inventing a second scheduler.

5. Observe the system while it runs

Use these commands while the daemon is active:

agentxchain status
agentxchain status --json
agentxchain schedule status
agentxchain schedule list
agentxchain events --follow

Watch for:

  • active continuous_session state
  • owner_type: "schedule" and the owning schedule id
  • runs_completed versus max_runs
  • current_vision_objective
  • continuous_blocked, continuous_running, continuous_completed, continuous_vision_exhausted, continuous_vision_expansion_exhausted, or continuous_failed

agentxchain events --follow is also the auto-chain audit trail. On every clean hand-off from one completed run to the next, you should see session_continuation <previous_run_id> -> <next_run_id> (<objective>). If a run completed cleanly and there is no continuation event, treat that as a defect, not as normal operator ambiguity.

paused is reserved for real blockers like unresolved escalations, blocked runs, or explicit on_idle: "human_review". A healthy post-completion path should stay running while it seeds the next objective, then end as completed, idle_exit, vision_exhausted, or vision_expansion_exhausted when it hits max_runs, bounded idle policy, PM exhaustion, PM expansion cap, or budget.

agentxchain status is the run/session truth surface. agentxchain schedule status is the daemon heartbeat surface. Do not confuse them.

5a. Bound startup ghost recovery

Continuous mode can automatically recover from transient startup ghosts: turns that reached failed_start because the runtime never produced startup proof. Enable it explicitly:

{
"run_loop": {
"continuous": {
"auto_retry_on_ghost": {
"enabled": true,
"max_retries_per_run": 3,
"cooldown_seconds": 5
}
}
}
}

When enabled, the loop uses the same reissueTurn() path as the manual agentxchain reissue-turn --turn <id> --reason ghost recovery, records attempts in continuous-session.json, emits auto_retried_ghost, and continues the same run. If the run hits max_retries_per_run, it emits ghost_retry_exhausted, keeps the session paused, and preserves the manual reissue-turn command in status.

Auto-retry is enabled by default only for the strict full-auto approval posture: approval_policy.phase_transitions.default === "auto_approve" and approval_policy.run_completion.action === "auto_approve" with continuous mode enabled. Generated BUG-59 safe-rule configs use phase_transitions.default: "require_human" with explicit auto-approve rules. That conservative posture does not auto-enable ghost retry. For those projects, opt in with the config above (run_loop.continuous.auto_retry_on_ghost.enabled: true) or use agentxchain run --continuous --auto-retry-on-ghost.

5b. Reconcile intentional operator commits

Manual recovery sometimes requires a human commit on top of the latest checkpoint. That commit should not force state surgery. If status shows Git HEAD has moved since checkpoint, run:

agentxchain reconcile-state --accept-operator-head

The command accepts only fast-forward operator commits that leave .agentxchain/ alone. It updates the governed baseline and emits state_reconciled_operator_commits. If it refuses, treat the refusal as real safety signal: history rewrites and governed-state edits are not safe lights-out inputs.

Under full-auto approval policy, the continuous loop already runs this reconcile before every dispatch — you do not need to run the command manually for safe fast-forward operator commits. The effective default for run_loop.continuous.reconcile_operator_commits is auto_safe_only when approval policy is full-auto; explicit config and the CLI flag always win.

{
"run_loop": {
"continuous": {
"reconcile_operator_commits": "auto_safe_only"
}
}
}

Valid modes:

  • manual: continuous drift stays blocking; operator runs agentxchain reconcile-state --accept-operator-head by hand. Default under governed-mode approval policies.
  • auto_safe_only: continuous loop auto-accepts fast-forward operator commits that leave .agentxchain/ alone before every dispatch. Default under full-auto approval policy. Unsafe commits still pause the session with operator_commit_reconcile_refused and the same refusal class the shared reconcile primitive returns (governance_state_modified, critical_artifact_deleted, history_rewrite, missing_baseline, git_unavailable, not_git_repo, or commit_walk_failed).
  • disabled: continuous loop skips the reconcile entirely. Use only if an external system owns baseline advancement.

Per-run override: agentxchain run --continuous --reconcile-operator-commits auto_safe_only. When the auto reconcile refuses a commit, the session pauses, status surfaces the refusal class and the same agentxchain reconcile-state --accept-operator-head recovery hint, and the loop emits operator_commit_reconcile_refused for audit. Manual and automatic reconcile both route through the same audited safety primitive; the auto path adds a blocked-state/event wrapper instead of reimplementing commit-range checks.

6. Handle blocked and failed states correctly

When the session pauses for a real blocker:

agentxchain status

Then run the exact recovery command surfaced by the governed state. Examples:

agentxchain unblock <id> # needs_human escalation
agentxchain reissue-turn --turn <id> --reason ghost
agentxchain reissue-turn --turn <id> --reason stale

Rules that matter:

  • blocked continuous sessions stay paused, not fake-completed
  • after the surfaced recovery action clears the blocker, the next daemon poll resumes the same session
  • session-budget exhaustion is a terminal stop, not a blocker; start a new session if you want to continue
  • non-blocked executor failure leaves the session failed for inspection instead of pretending the intake intent completed

If you need the full recovery matrix, use Recovery. This page keeps the operational path short; the recovery page is the canonical blocked-state map.

7. Inject explicit human priority when the vision queue is wrong

Vision seeding is fallback discovery, not authority. Human priority wins.

Inject urgent work like this:

agentxchain inject "Fix the broken release-note sidebar ordering" --priority p0

Or stage lower-priority work without approval:

agentxchain inject "Investigate docs stack migration options" --priority p2 --no-approve

Important behavior:

  • injected p0 work preempts new vision seeding
  • the continuous loop yields with priority_preempted and the daemon consumes the injected item first
  • non-p0 items join the intake queue in normal priority order

For the intake boundary and queue semantics, see Continuous Delivery Intake.

8. Know how the loop stops

Continuous mode stops for specific reasons:

  • max_runs reached
  • max_idle_cycles reached because no queued or vision-derived work was found
  • per_session_max_usd or --session-budget reached
  • blocked human escalation that you have not resolved yet
  • operator SIGINT

SIGINT behavior is intentional:

  • first Ctrl+C: finish the current in-flight work, then stop
  • second Ctrl+C: hard-abort

For daemon-owned operation, stop the daemon the same way you stop any foreground process or session-manager job.

9. Minimum operating discipline

If you want unattended execution without lying to yourself, keep this floor:

  1. Validate the repo with doctor and connector check.
  2. Run one bounded run --continuous --max-runs 1 proof first.
  3. Set a real session budget.
  4. Watch status, schedule status, and events --follow.
  5. Use inject --priority p0 when human judgment needs to override vision seeding.
  6. Use unblock to resolve true blockers; do not hack state files.

That is the stable repo-local lights-out story today.