jhf-spool Operations
Documentation Map
-
Operations
-
Channel:
stable -
Source repo:
solarisara/jhf-spool
Operations
Version: 2026-04-01
Start / Run / Deploy
- entrypoint:
compose.dev.yaml - low-cpu override:
compose.lowcpu.yaml - scripts:
scripts/dev-up.shscripts/dev-down.shscripts/ops/deploy_news_memory_main_stack.shdocs/STACK_CONTRACT.md(canonical runtime contract)
Shared-host n8n port defaults:
- default host port:
NEWS_MEMORY_N8N_PORT=25678 - reserved/forbidden by default:
15678(shared-host global n8n runtime) - startup preflight in both start scripts fails early when the target n8n port is busy
- optional override for reserved ports:
NEWS_MEMORY_ALLOW_RESERVED_N8N_PORTS=1
Health and Readiness
/v1/health/live/v1/health/ready/v1/health/info/v1/fabric/metadata/metrics
Additional operator surfaces:
/v1/research/operational-slo-gates/v1/research/security-compliance-gates/v1/research/incident-readiness-gates/v1/research/tls-proxy-readiness/v1/research/secret-readiness/v1/research/paddle-readiness
Current healthcheck cadence policy (maintained stack):
postgres,minio,redis:120sinterval,5stimeout,3retries,60sstart periodapi:60sinterval,3stimeout,3retries,30sstart periodn8n:60sinterval,5stimeout,3retries,45sstart period- reverse-proxy and observability: host-managed in integrated mode, outside this compose stack
Monitoring
Current stack includes:
- OpenTelemetry collector
- host-managed Prometheus/Grafana in integrated deployments
- runtime and readiness helper scripts
Logs and Diagnostics
Primary diagnostic paths:
- API container logs from the maintained Compose stack
- host reverse-proxy logs when TLS or upstream routing fails
- n8n container logs when scheduled orchestration degrades
- host observability surfaces for health, readiness, and freshness signals
Useful scripts:
scripts/ops/run_host_release_check.pyscripts/ops/run_operational_release_checks.pyscripts/ops/run_live_platform_journey.pyscripts/ops/evaluate_n8n_live_readiness.pyscripts/ops/verify_runtime_materialization_drift.pyscripts/ops/evaluate_secret_readiness.pyscripts/ops/evaluate_fabric_combination_consumer.pyscripts/ops/run_lowcpu_soak_probe.pyscripts/ops/query_gitea_actions_runs.py
Gitea Actions run API compatibility:
- in this environment
/api/v1/repos/{owner}/{repo}/actions/runscan return404 - maintained run-status collection must use
scripts/ops/query_gitea_actions_runs.py - the helper uses
/actions/runsfirst and automatically falls back to/actions/taskson404
Release-check telemetry snapshot:
run_operational_release_checks.pynow emitshealthcheck_loadin report JSON- block includes:
- host CPU sample (
usage_percent_1s, best effort) exec_createsample count over a short time window
- host CPU sample (
- fallback behavior:
- if CPU sampling fails,
cpu_sample.available=falseand error text is preserved - if
timeoutis unavailable on host,exec_create_sample.available=falsewith fallback error marker
- if CPU sampling fails,
Example:
python scripts/ops/run_host_release_check.py --output-dir reports/operational-release-checks
Runtime materialization drift verify:
python scripts/ops/verify_runtime_materialization_drift.py \
--host <internal-runtime-redacted><internal-runtime-redacted> \
--repo-path-on-host /home/administrator/jhf-spool-main \
--base-url https://<internal-runtime-redacted> \
--insecure \
--output reports/runtime-materialization/latest.json
The verifier compares:
- repo-owned runtime contract and compose truth
- active host compose materialization
- running container env/labels/mounts/networks
- app readback from
/v1/health/info
It fails on missing keys, undocumented non-interpolated overrides, container/app readback mismatch, and externally visible ingress drift.
Standard Restart Order
When the maintained stack needs a bounded restart:
- verify database and storage dependencies first
- start or recover PostgreSQL, MinIO, Qdrant, and Redis
- start the API service
- verify host-managed proxy/TLS edge
- verify n8n only after the API is healthy
Compose Core Stack Recovery
If host TLS is up but the product is unavailable:
- inspect whether the core
jhf-spoolservices still exist - recover missing core containers before debugging host proxy or TLS
- verify
/v1/health/liveand/v1/health/ready - verify docs and OpenAPI surfaces
- re-run the operational release check if the outage affected orchestration or gates
Known Failure Modes
- core stack partly absent while host proxy remains up
- TLS trust issues mistaken for service outages
- external source/provider drift
- inactive n8n workflows causing stale automation
Typical 502 Cases
- host proxy upstream points to stale backend target
- upstream API container stopped or absent
- DNS lookup mismatch between host proxy config and active backend target
Typical TLS Cases
- local trust failure against the internal Caddy CA
- HTTP/HTTPS mismatch during manual checks
- wrong public base URL or reverse-proxy configuration
Typical n8n Cases
- workflows deployed but inactive
- stale
NEWS_MEMORY_API_BASE_URL - invalid or missing shared API key
- workflow host path still pointing at an old domain/base URL
Runtime Dependencies
Hard:
- PostgreSQL
- MinIO
- Qdrant
- Redis
Optional:
- n8n
- Paddle
- NewsAPI
- external source providers
Weak Host Mode
For low-resource hosts, run the maintained stack with:
docker compose -f compose.dev.yaml -f compose.lowcpu.yaml up -d --build
This keeps the same healthcheck surfaces but stretches intervals to reduce healthcheck exec load.
For lightweight release telemetry sampling on weak hosts:
- keep telemetry windows short (
30sdefault) - avoid full monitoring suites for routine verification
- use release report snapshots as regression evidence between rollouts
For 24h low-cpu soak evidence collection:
python scripts/ops/run_lowcpu_soak_probe.py \
--host <internal-runtime-redacted><internal-runtime-redacted> \
--stack-prefix jhf-spool- \
--samples 24 \
--sample-interval-seconds 3600 \
--telemetry-window-seconds 30
Host-target note:
- when running the collector on the same host as the stack, use
--host <internal-runtime-redacted> - user-prefixed self-targets like
<internal-runtime-redacted><internal-runtime-redacted>are normalized to local execution by the collector to avoid SSH self-auth failures
Artifacts are written to reports/healthcheck-soak/ as:
- per-sample JSONL telemetry stream
- summarized metrics JSON
Credential Drift Verify (Spool)
For jhf-spool auth-resilience checks (valid key, invalid key, rotated key) against /v1/search/semantic:
python scripts/ops/verify_spool_auth_rotation.py \
--base-url https://<internal-runtime-redacted> \
--valid-key "$VALID_SPOOL_KEY" \
--invalid-key "$INTENTIONALLY_INVALID_KEY" \
--rotated-key "$ROTATED_SPOOL_KEY" \
--insecure \
--strict \
--output reports/auth-rotation/latest.json
The output is machine-readable and separates:
- auth drift (
401on invalid key while valid key path is healthy) - rotation recovery (rotated key succeeds again)
- potential platform/network outage (valid path not healthy)
Canonical consumer contract:
docs/CREDENTIAL_ROTATION_CONTRACT.md(spool-auth-rotation-v1)
License: AGPLv3
Learn more: https://helpifyr.com