Architecture overview
This page describes how HeliosLogs is put together from an operator's point of view — enough to deploy, size, and reason about it. It does not cover internal code structure.
The big picture
Everything is one process. A single node needs only a local data directory. Adding a shared store turns one or more nodes into a converging cluster.
Local-primary storage
HeliosLogs always reads and writes its local --data-dir first. That path is the fast, durable, on-node copy and is never blocked by a slow shared store.
Each (env, index, day) partition is stored as a set of immutable blocks behind a tiny manifest:
data/<env>/<index>/<yyyy-mm-dd>/
manifest/<gen>.json # the live set of blocks; highest generation wins
blocks/<block_id>.hb # immutable, self-describing, written onceWhat an operator needs to know:
- Blocks are immutable and append-only. No day is ever "sealed" — late and backfilled data is just another block append.
- Compaction is automatic. The engine merges many small blocks into fewer large ones in the background (size-based, no manual step). The legacy
/api/admin/mergeand/commitendpoints are informational no-ops. - Sizing is tunable via the block-engine knobs in Performance tuning.
Control plane
The control plane (identity, settings, saved searches, dashboards, monitors, alerts, conversations) is a set of encrypted JSON files on a compare-and-swap object store — co-located with the data, in <data-dir>/_control (single node) or a _control prefix in the shared store (cluster). There is no external database. See Secrets & encryption.
Background workers
helios serve runs several background tasks. On a single node they all just run. In a cluster, the ones that must run exactly once self-elect a leader via a best-effort lease in the control plane — there is no separate coordinator and no per-node configuration:
| Worker | What it does | Cluster behavior |
|---|---|---|
| Compactor | Merges small blocks into larger ones | Single leader elected per cluster |
| Retention sweeper | Drops day-partitions past their retention | Single leader elected per cluster |
| Shared-store sync | Uploads local blocks, pulls peers' blocks | Runs on every node |
| Source supervisor | Polls configured pull sources | Per-source lease |
| Monitor scheduler | Evaluates monitors on schedule | Per-monitor lease |
Multi-node and disaster recovery
With --shared-store pointed at an NFS path or s3://bucket/prefix, each node keeps writing locally and replicates asynchronously: an uploader pushes this node's blocks to the shared store, and a puller brings peers' blocks down. Cross-node visibility is eventual — bounded by the sync interval (≈10s by default) — while same-node reads are immediate.
Because both data and the control plane live in the shared store, disaster recovery is "replicate the bucket." Point a fresh node at the backup and it rebuilds its local cache. Full details in Multi-node & shared store.
Two things to back up beyond the bucket
The bucket holds your data and control plane, but the encryption key and JWT secret live in separate files outside the data directory. Losing the control key means losing the control plane. See Secrets & encryption.
Self-observability
HeliosLogs indexes its own activity back into the reserved _system environment, so you can investigate HeliosLogs with HeliosLogs using the normal search UI:
| Index | Contents |
|---|---|
_helioslogs | Internal tracing events (info/warn/error). |
_helioshttp | HTTP access logs (method, path, status, latency). |
_heliosmcp | MCP tool calls (tool, status, duration, arguments). |
Each document is stamped with the node's identity, which is useful for pinpointing behavior in a cluster. See Self-observability.
Cryptography
All cryptography routes through a single backend (aws-lc-rs): AES-256-GCM for the control plane at rest, PBKDF2-HMAC-SHA256 for passwords, HMAC-SHA256 for JWTs, and rustls for outbound TLS. A FIPS 140-3 build runs the same code against the AWS-LC validated module — see FIPS 140-3.
Ready to run it? Head to the Quickstart.