Multi-node & shared store
A single HeliosLogs node needs nothing but a local data directory. To run several nodes — for high availability, or to scale ingest and query across machines — point them all at one shared store. They converge automatically; there is no separate coordinator to deploy.
How it works
HeliosLogs is local-primary: every node always reads and writes its own local --data-dir first (fast, durable, never blocked by a slow store). The shared store is a replication target, kept in sync by background tasks:
- Uploader — pushes this node's own immutable blocks to the shared store, then appends them to the shared manifest with a compare-and-swap.
- Puller — brings peers' blocks (and shared-side compaction results) into this node's local manifest.
- Seeder — bootstraps a fresh shared store from a node that already holds data.
Coordination is a single compare-and-swap on each partition's manifest, so any number of nodes can write the same logical partition without locks. The control plane replicates the same way.
Eventual cross-node visibility
A write on node A becomes visible on node B after the next sync — bounded by HELIOS_BLOCK_SYNC_SECS (default 10 s). Reads on the writing node are immediate. Plan dashboards and monitors with this small lag in mind.
Enabling it
Pass --shared-store to every node. The value is either a filesystem/NFS path or an S3 URL:
# Filesystem / NFS
helios serve --shared-store /mnt/helios-shared --data-dir ./data ...
# Amazon S3
AWS_REGION=us-east-1 helios serve --shared-store s3://my-bucket/helios --data-dir ./data ...Both data partitions and the control plane live under the shared store (a _control prefix), so all nodes share users, settings, dashboards, and monitors.
S3 configuration
When the shared store is s3://bucket/prefix:
- Region is required. Set
AWS_REGION(e.g.us-east-1), or add aregionline to your AWS profile. HeliosLogs does not use IMDS for the region (it hangs off-EC2), so an SSO profile that only hassso_regionis not enough — setAWS_REGIONexplicitly. Startup fails with a clear error if no region is found. - Credentials use the standard AWS provider chain (environment variables, shared credentials file, SSO, assumed roles, EC2/ECS instance roles, …).
- The prefix is normalized to end with
/;s3://bucket(no prefix) is valid.
Leader-elected background work
Some jobs must run exactly once across the cluster. They self-elect a leader via a best-effort lease in the control plane — no configuration, and a lone node always wins:
| Job | Election |
|---|---|
| Compaction | One leader per cluster (lease renewed each pass; a stale lease is reclaimed). |
| Retention sweep | One leader per cluster. Non-leaders still run a local pass to clear their own pending uploads for dropped partitions. |
The sync tasks (uploader/puller/seeder) run on every node.
Shared secrets are mandatory
Every node must share the same keys
The control-plane encryption key and the JWT signing secret are not stored in the shared store. Each node loads them from local files. For a cluster they must resolve to the same key material on every node, or:
- a node can't decrypt the shared control plane (encryption key mismatch), and
- tokens minted on one node are rejected on another (JWT secret mismatch).
Point HELIOS_CONTROL_KEY_PATH and HELIOS_JWT_SECRET_PATH at a shared mount or distribute identical files. Also keep HELIOS_CONTROL_ENCRYPTION consistent across nodes. See Secrets & encryption.
Disaster recovery
Because all durable state lives in the shared store, DR is "replicate the bucket."
- Replicate the shared store (S3 cross-region replication, bucket backup, NFS snapshot/rsync).
- Keep a backup of the two secret files.
- To recover, point a fresh node at the replicated store (and the same secrets). Its local cache rebuilds from the shared manifests on first read.
A local two-node example
The repo ships scripts that demonstrate a primary + replica against a local ./shared directory — a quick way to see replication on one machine:
./start-primary.sh # node on :7300, --shared-store ./shared, data in ./data
./start-replica.sh # node on :7400, --shared-store ./shared, data in ./replicaIngest into one and watch it appear in the other after a sync interval. (These are demo scripts — for real clusters use S3 or NFS and shared secrets as above.)