Architecture overview

This page describes how HeliosLogs is put together from an operator's point of view — enough to deploy, size, and reason about it. It does not cover internal code structure.

The big picture

Everything is one process. A single node needs only a local data directory. Adding a shared store turns one or more nodes into a converging cluster.

Local-primary storage

HeliosLogs always reads and writes its local --data-dir first. That path is the fast, durable, on-node copy and is never blocked by a slow shared store.

Each (env, index, day) partition is stored as a set of immutable blocks behind a tiny manifest:

data/<env>/<index>/<yyyy-mm-dd>/
  manifest/<gen>.json      # the live set of blocks; highest generation wins
  blocks/<block_id>.hb     # immutable, self-describing, written once

What an operator needs to know:

Blocks are immutable and append-only. No day is ever "sealed" — late and backfilled data is just another block append.
Compaction is automatic. The engine merges many small blocks into fewer large ones in the background (size-based, no manual step). The legacy /api/admin/merge and /commit endpoints are informational no-ops.
Sizing is tunable via the block-engine knobs in Performance tuning.

Control plane

The control plane (identity, settings, saved searches, dashboards, monitors, alerts, conversations) is a set of encrypted JSON files on a compare-and-swap object store — co-located with the data, in <data-dir>/_control (single node) or a _control prefix in the shared store (cluster). There is no external database. See Secrets & encryption.

Background workers

helioslogs serve runs several background tasks. On a single node they all just run. In a cluster, the ones that must run exactly once self-elect a leader via a best-effort lease in the control plane — there is no separate coordinator and no per-node configuration:

Worker	What it does	Cluster behavior
Compactor	Merges small blocks into larger ones	Single leader elected per cluster
Retention sweeper	Drops day-partitions past their retention	Single leader elected per cluster
Shared-store sync	Uploads local blocks, pulls peers' blocks	Runs on every node
Source supervisor	Polls configured pull sources	Per-source lease
Monitor scheduler	Evaluates monitors on schedule	Per-monitor lease

Multi-node and disaster recovery

With --shared-store pointed at an NFS path or s3://bucket/prefix, each node keeps writing locally and replicates asynchronously: an uploader pushes this node's blocks to the shared store, and a puller brings peers' blocks down. Cross-node visibility is eventual — bounded by the sync interval (≈10s by default) — while same-node reads are immediate.

Because both data and the control plane live in the shared store, disaster recovery is "replicate the bucket." Point a fresh node at the backup and it rebuilds its local cache. Full details in Multi-node & shared store.

Two things to back up beyond the bucket

The bucket holds your data and control plane, but the encryption key and JWT secret live in separate files outside the data directory. Losing the control key means losing the control plane. See Secrets & encryption.

Self-observability

HeliosLogs indexes its own activity back into the reserved _system environment, so you can investigate HeliosLogs with HeliosLogs using the normal search UI:

Index	Contents
`_helioslogs`	Internal tracing events (info/warn/error).
`_helioshttp`	HTTP access logs (method, path, status, latency).
`_heliosmcp`	MCP tool calls (tool, status, duration, arguments).

Each document is stamped with the node's identity, which is useful for pinpointing behavior in a cluster. See Self-observability.

Cryptography

All cryptography routes through a single backend (aws-lc-rs): AES-256-GCM for the control plane at rest, PBKDF2-HMAC-SHA256 for passwords, HMAC-SHA256 for JWTs, and rustls for outbound TLS. A FIPS 140-3 build runs the same code against the AWS-LC validated module — see FIPS 140-3.

Ready to run it? Head to the Quickstart.

Architecture overview ​

The big picture ​

Local-primary storage ​

Control plane ​

Background workers ​

Multi-node and disaster recovery ​

Self-observability ​

Cryptography ​