Skip to main content

Operations

Documentation Map

Operations

Start / Run / Deploy

Canonical runtime:

  • infra/compose/docker-compose.yml
  • .env from .env.example
  • repo-owned bootstrap and alignment scripts under infra/scripts

Primary deploy/alignment helper:

  • infra/scripts/align_host_runtime.py

Health

Use:

bash infra/scripts/healthcheck.sh

This checks:

  • Compose service state
  • ERPNext doctor output
  • Helpifyr Spindle integration health endpoint

For four-layer runtime materialization drift checks, use:

python infra/scripts/verify_runtime_materialization.py --output artifacts/evidence/runtime-materialization.json

This verifies:

  • repo-owned runtime truth from .env.example
  • active host env / compose input truth
  • running container env and compose labels
  • app readback via integration_status and MCP /healthz

The verifier redacts secret-like values and fails when canonical runtime truth is missing, stale, or contradicted across layers.

For deterministic stack upgrade truth generation (ERPNext, Zammad, MariaDB, Redis profiles), use:

python maintenance/pull_stack_oss_inventory.py --output test-results/stack-oss-inventory.workspace.json
python maintenance/generate_stack_upgrade_plan.py --inventory test-results/stack-oss-inventory.workspace.json --output test-results/stack-upgrade-plan.workspace.json

The second command fails if deterministic refs are missing or if Zammad resolves to latest.

For repo-owned OSS inventory/version/policy drift checks across compose runtime, CI actions, and Python dependency surfaces, use:

python maintenance/verify_oss_inventory_version_truth.py --output test-results/oss-version-truth.verify.json

For live runtime materialization comparison on the shared host:

python maintenance/verify_oss_inventory_version_truth.py --check-live --ssh-target <internal-runtime-redacted><internal-runtime-redacted> --output artifacts/evidence/oss-version-truth.live.json

This check fails when inventory components are missing, version truth drifts, latest/floating runtime refs appear without classification, or live runtime images diverge from repo-owned truth.

For authenticated MCP read-only demo action verification on the live host, use:

python infra/scripts/verify_authenticated_mcp_demo_action.py --ssh-target <internal-runtime-redacted><internal-runtime-redacted> --host-repo-path /home/administrator/jhf-spindle --output artifacts/evidence/mcp-authenticated-demo-action-live.json

The verifier is fail-closed: it requires host-side and running mcp-gateway container MCP key materialization and rejects unauthenticated access.

Readiness

Current readiness-like surfaces:

  • integration_status
  • MCP /healthz
  • targeted smoke scripts

Known gap:

  • no dedicated /ready or /readiness endpoint exists today
  • the current readiness view is documentary and operational, not a standalone runtime readiness contract

Version View

Current version-like sources:

  • README version marker on main
  • Git revision on main
  • host-alignment and smoke evidence when a target runtime is being verified

Known gap:

  • no dedicated runtime /version endpoint exists today

Connector Smoke

Use:

bash infra/scripts/contract-smoke.sh

This sends the checked-in sample supplier, Stripe, and Paddle contracts to the configured Helpifyr Spindle base URL.

Live Smoke Pack

Use:

python infra/scripts/live_smoke_pack.py --insecure --api-key "$JHF_SPINDLE_MCP_SMOKE_API_KEY" --output artifacts/live-smoke.json

This runs a non-destructive host and MCP smoke pack against the closest live Helpifyr Spindle runtime. It checks host-side Helpifyr Spindle containers, the integration health endpoint, and a wide read-only MCP tool surface across approvals, bank, SEPA, dunning, period close, procurement, contracts, HR, payroll, assets, VAT, reporting, and intercompany domains.

For final finance-close verification, the preferred non-destructive operator path is:

  • create or list Period Close Checklist items
  • create or list Annual Close Checklist items
  • render structured VAT exports for XRECHNUNG, PEPPOL, or ZUGFERD
  • create SEPA collection batches in preview-safe test tenants and inspect persisted pain.008 XML payloads

Logs

Primary sources:

  • Compose container logs
  • Frappe/ERP runtime output
  • persisted Integration Event, Dispatch Job, Approval Packet, Dead Letter, and Agent Notification evidence

Artifact Governance

  • artifacts/ is governed by docs/ARTIFACT_POLICY.md
  • only canonical evidence files are intended for version control
  • high-churn run/slice replay snapshots are local-only and should be cleaned after debugging windows

Monitoring

Important operator states:

  • gateway liveness
  • integration health
  • dispatch backlog and callback completion
  • approval backlog and stale work
  • resilience/dead-letter backlog
  • repo/host revision drift

Runtime Load Telemetry (Issue #15)

Use the repo-owned script to measure Helpifyr Spindle-scoped runtime load instead of host-global noise:

python infra/scripts/runtime_load_telemetry.py --project-prefix jhf-spindle --sample-seconds 60 --output artifacts/runtime-load-telemetry.json

Recommended before/after workflow:

  1. capture baseline before a runtime change:
python infra/scripts/runtime_load_telemetry.py --project-prefix jhf-spindle --sample-seconds 60 --output artifacts/runtime-load-before.json
  1. apply change
  2. capture after snapshot:
python infra/scripts/runtime_load_telemetry.py --project-prefix jhf-spindle --sample-seconds 60 --output artifacts/runtime-load-after.json
  1. compare:
    • host_cpu_percent
    • docker_exec_create.project_events
    • docker_exec_create.project_per_container
    • project_health

The script is Linux-host oriented (/proc/stat + Docker CLI). If /proc/stat is unavailable, CPU is returned as null while container/event telemetry remains usable.

Callback Transport Diagnostics

Use the repo-owned transport probe when callback delivery from n8n/jhf-wire is unclear:

python infra/scripts/check_callback_transport.py --output artifacts/callback-transport.json

For certificate-trust triage from runtime peers:

python infra/scripts/check_callback_transport.py --insecure-tls --output artifacts/callback-transport-insecure.json

Interpretation:

  • reachable=true and HTTP status returned: transport is available
  • error_class=tls_verify_failed: route exists but peer trust store rejects the certificate
  • error_class=connection_refused: ingress route/port is not accepting connections

For same-host internal loops, prefer an explicit callback route via N8N_DISPATCH_CALLBACK_URL when 443 trust from peer containers is not guaranteed.

For MCP smoke/release checks, provision an active key first:

  • JHF_SPINDLE_MCP_SMOKE_API_KEY in the operator shell
  • matching active dedicated smoke MCP API Key record (hash + read-only smoke permissions) in ERP
  • keep JHF_SPINDLE_MCP_API_KEY for runtime/app traffic only (least-privileged path)

Weak-Host Profiles

ModeCPU/RAM postureHealthcheck postureVerify posture
Standalone (default)baseline host for ERP+workerscore DB 30s, MCP gateway 120sfast smoke by default; full suites only on explicit runs
Standalone (low-CPU override)constrained host with reduced headroomDB 60s, MCP gateway 180s via infra/compose/docker-compose.low-cpu.ymlkeep heavy checks/manual stacks off unless needed
Integrated (planned read-first)baseline plus external consumer pollingsame as standalone; no aggressive poll loopsread-first compatibility checks before any control-side expansion

Rules:

  • prefer low-frequency lightweight probes instead of heavy script healthchecks
  • MCP gateway interval is configurable via MCP_GATEWAY_HEALTHCHECK_INTERVAL (default 120s)
  • start test/verify companion stacks only for explicit verification windows, then stop them
  • if host pressure rises, switch to low-CPU compose override before broadening workload

Backup

Use:

bash infra/scripts/backup.sh

Artifacts:

  • MariaDB dump
  • sites volume archive
  • bench-generated site backup files

Restore

Use:

bash infra/scripts/restore.sh /path/to/backup-dir

Run restore only against the isolated Helpifyr Spindle stack.

Restart / Recovery

  • use normal Compose restart/recreate for isolated services
  • if repo/host drift is suspected, prefer the repo-owned alignment path over ad hoc manual edits
  • if gateway/front-end routing breaks after backend recreation, ensure frontend/gateway are recreated after backend health is restored

Operational Rules

  • do not connect Helpifyr Spindle to the existing OpenClaw or n8n internal Docker networks unless there is a deliberate future design change
  • keep shared secrets outside git
  • treat Integration Event as append-only evidence
  • do not let external systems write directly to ERPNext accounting tables
  • perform new connector rollouts against staging first when available

Runtime Dependency Notes

  • OpenClaw, n8n, and jhf-wire are integration counterparts, not local runtime replacements
  • MariaDB, Redis, and ERPNext/Frappe availability are hard runtime dependencies
  • no repo-owned metrics endpoint exists today; monitoring is evidence- and smoke-script-driven

License notice: AGPLv3 (GNU Affero General Public License v3.0)
Website: https://helpifyr.com