Diagnostics ladder

When a Loom run fails, follow this sequence from the smallest pointer to the narrowest evidence. The goal is not "open the biggest log file" but "open the exact phase or step that failed."

Use this when

Situation	Start with
You have a failed run and want the fastest path to evidence	The receipt path Loom printed
A teammate shared a run directory	`.loom/.runtime/logs/<run_id>/`
You suspect provider or contract behavior	The receipt, then `phase_report_path` if present

Bring one of these

The receipt path Loom printed, for example receipt: /absolute/path/to/repo/.loom/.runtime/receipts/loom-run-local-....json
The runtime logs directory, .loom/.runtime/logs/<run_id>/

If you have the receipt, you also have the optional phase_report_path pointer for runtime-contract validation.

Decision path

Quick symptom map

Symptom	Open this file first
The whole run failed and you need the first failing job	`pipeline/manifest.json`
You already know the failing job	`jobs/<job_id>/manifest.json`
The job failed before user script execution	`jobs/<job_id>/system/<section>/events.jsonl` via the job manifest
The job failed in a user step	`jobs/<job_id>/user/execution/script/<NN>/events.jsonl` via the job manifest

Step 1 — Receipt: confirm the run and the pointers

Open: the receipt JSON Loom printed.

Inspect these fields:

Field	Why it matters
`status`, `exit_code`	Tells you whether the run failed
`logs_dir`	Root pointer for the ladder below
`phase_report_path`	Optional pointer for phase validation and coverage

Go next: open pipeline/summary.json or pipeline/manifest.json under logs_dir.

If the run succeeded but the outcome is still wrong, the receipt still gives you the correct run root and phase report.

Step 2 — Pipeline summary: did the pipeline fail?

Open: pipeline/summary.json

{
  "schema_version": "loom.runtime.logs.v2",
  "run_id": "loom-run-local-1772865600000000000",
  "pipeline_id": "loom-local-1772865600000000000",
  "status": "failure",
  "exit_code": 1,
  "duration_ms": 155557,
  "error": "job \"check-pnpm\" failed"
}

Field	What it tells you
`status`	`success` or `failure`
`exit_code`	Pipeline exit code
`error`	Top-level pipeline error when Loom has one

Go next: if the pipeline failed, open pipeline/manifest.json.

If status is success, stop using the failure ladder and switch to output validation or artifact inspection instead.

Step 3 — Pipeline manifest: which job should you inspect first?

Open: pipeline/manifest.json

{
  "schema_version": "loom.runtime.logs.v2",
  "status": "failure",
  "exit_code": 1,
  "failing_job_id": "check-pnpm",
  "failing_job_manifest_path": "jobs/check-pnpm/manifest.json",
  "jobs": [
    {
      "job_id": "check-pnpm",
      "status": "failed",
      "job_manifest_path": "jobs/check-pnpm/manifest.json",
      "job_summary_path": "jobs/check-pnpm/summary.json",
      "system_events_path": "jobs/check-pnpm/system/provider/events.jsonl",
      "artifacts_path": "jobs/check-pnpm/artifacts"
    }
  ]
}

Field	What it tells you
`failing_job_id`	The first failing job
`failing_job_manifest_path`	The next file to open
`jobs[]`	The full job roster, including artifact pointers

Go next: open failing_job_manifest_path.

Start with the first failing job unless you already know the issue is global.

Step 4 — Job manifest: user step or system section?

Open: jobs/<job_id>/manifest.json

This is the main branching point.

User-step failure

If failing_step_events_path is present, your failure is in user execution:

{
  "job_id": "check-pnpm",
  "status": "failed",
  "failing_section": "script",
  "failing_step_index": 2,
  "failing_step_events_path": "jobs/check-pnpm/user/execution/script/02/events.jsonl",
  "user_steps": [
    {
      "step_id": "script-02",
      "command_preview": "pnpm install --frozen-lockfile",
      "step_events_path": "jobs/check-pnpm/user/execution/script/02/events.jsonl"
    }
  ]
}

Go next: open the pointed step events file.

System-section failure

If failing_step_events_path is absent, the job failed in provider, cache, artifact, or other system work:

{
  "job_id": "build-image",
  "status": "failed",
  "failing_section": "provider",
  "system_sections": [
    {
      "system_section": "provider",
      "phase_code": "job.provider_prepare",
      "events_path": "jobs/build-image/system/provider/events.jsonl"
    },
    {
      "system_section": "cache_restore",
      "phase_code": "job.cache_restore",
      "events_path": "jobs/build-image/system/cache_restore/events.jsonl"
    }
  ]
}

Use failing_section to choose the matching events_path.

Step 5 — Read the pointed `events.jsonl`

Open: the exact events file from Step 4.

The current runtime contract uses phase_start, output, and phase_finish records:

{"schema_version":"loom.runtime.logs.v2","ts":"2026-03-07T12:34:56Z","seq":14,"level":"info","event":"output","scope":"step","phase_code":"execution.script","phase_family":"user","stream":"stderr","message":"ERR! Missing lockfile entry for @docusaurus/core"}
{"schema_version":"loom.runtime.logs.v2","ts":"2026-03-07T12:34:56Z","seq":15,"level":"error","event":"phase_finish","scope":"step","phase_code":"execution.script","phase_family":"user","status":"failed","exit_code":1,"duration_ms":942}

Field	What to look for
`phase_code`	Which phase failed
`message`	The actual stderr or stdout line on `output` events
`status`, `exit_code`	The closing outcome on `phase_finish`
`metrics`	Skip and telemetry detail for cache or artifact sections

Go next: only widen to summaries or the phase report if this pointed file still does not explain the problem.

Look for the final phase_finish first, then read the preceding output events for the error text.

Optional Step 6 — Phase report: validate the timeline

Open: phase-report.json or follow phase_report_path from the receipt.

Use it when the main failure ladder is not enough:

Question	Why the phase report helps
"Did Loom emit all required phase boundaries?"	`validation` and `plan` answer that directly
"Did cache or artifact phases run?"	`phase_metrics` and `plan.requirements` show them
"Was work ordered correctly?"	Validation issues include ordering failures
"How much runtime was attributed?"	`coverage` reports attributed vs unattributed runtime

When to widen scope

Only widen scope after the pointed events file fails to explain the problem.

Situation	Next move
The failing unit still does not explain the error	Read `jobs/<job_id>/summary.json` and the matching system section summary
Multiple jobs failed	Revisit `pipeline/manifest.json` and inspect the next failed job
You suspect a contract bug rather than a task failure	Open `phase-report.json`
You need the exact workspace or revision context	Go back to the receipt

Diagnostics ladder

Use this when​

Bring one of these​

Decision path​

Quick symptom map​

Step 1 — Receipt: confirm the run and the pointers​

Step 2 — Pipeline summary: did the pipeline fail?​

Step 3 — Pipeline manifest: which job should you inspect first?​

Step 4 — Job manifest: user step or system section?​

User-step failure​

System-section failure​

Step 5 — Read the pointed events.jsonl​

Optional Step 6 — Phase report: validate the timeline​

When to widen scope​

What to do next​