Skip to content

Core concepts

A handful of concepts explain almost everything about how HeliosLogs stores, scopes, and serves your logs. Read this once and the rest of the documentation will make sense quickly.

Environments

An environment (env) is the top-level tenancy boundary. Think of it as a workspace: every index and every saved search belongs to exactly one environment. Switching environments in the top-nav picker changes which data you see across the whole UI.

Two environments ship with every install:

EnvironmentPurpose
defaultYour initial user data. Every install starts here, and it cannot be deleted.
_systemHeliosLogs's own self-observability logs. Admin-only.

Admins create additional environments — for example dev, test, and prod, or one per team — from Admin → Environments. See Environments.

A few things are deliberately not env-scoped, because they follow a person rather than a workspace:

  • Monitors keep an env as their run target but are listed across all environments.
  • Agent conversations follow the user — switching environments keeps your chat history.

Setting the active environment

The active environment is a per-browser UI preference. It is appended to every API request as ?env=<name>. You can override it per request on the URL.

Indexes

An index is the storage partition key within an environment. Where the environment is the tenancy boundary, the index groups related events for routing and pruning — for example app-logs, nginx, or stripe-webhooks.

  • You choose the index at ingest time with ?index=<name> (it defaults to default).
  • You filter on it at query time with index:<pattern>, which supports wildcards: index:stripe-webhooks, index:*webhooks, index:stripe-* OR index:github-*.
  • Indexes are created on first write — there is nothing to declare in advance.

Partitions

Internally, each (env, index, day) combination is an independent partition stored at data/<env>/<index>/<yyyy-mm-dd>/. Events are routed to the matching partition by their parsed event timestamp, not by arrival time — so backfilled and late-arriving data lands in the correct day automatically.

You rarely interact with partitions directly, but the model explains two things operators care about:

  • Time-range queries are cheap — HeliosLogs prunes whole partitions that fall outside the query window before reading any data.
  • Retention is per day-partition — expired days are dropped whole. See Indexes & retention.

Schema-on-read

HeliosLogs does not require you to declare a schema. It is schema-on-read: only a small set of universal-core fields get structural treatment, and every other JSON key is discovered at query time.

Universal-core fieldWhat it's for
timestampSort key, histogram bucketing, and time-range filtering.
messageThe human-readable log line (supports phrase queries).
rawThe full original event, full-text indexed so bare-term queries find anything.
sourceAn optional per-event tag set during ingestion, queried as source:value.

Everything else — level, service, user_id, http.status_code, and any other key in your events — lands in a dynamic column and is queried as <key>:value. This means:

  • There is no schema to declare or migrate. Ship a new field and it is immediately queryable.
  • Nested objects are flattened to dotted paths at ingest: {a:{b:1}} becomes the column a.b, queried as a.b:1.
  • A field that carries mixed types across events (a number in one event, a string in the next) is preserved losslessly, not coerced or rejected.

To learn which fields exist in your data, use the field panel in the UI or field discovery.

Control plane

Search data is one half of HeliosLogs; the control plane is the other. It holds everything that isn't log events:

  • Users and their RBAC allowlists
  • Environments and settings (including LLM API keys)
  • Saved searches, dashboards, monitors, and alerts
  • Agent conversations

The control plane is not a database. It is a set of small, encrypted JSON files on a compare-and-swap object store — so there is no Postgres or SQLite to run, and disaster recovery is "replicate the bucket." It lives in <data-dir>/_control for a single node, or in a _control prefix inside the shared store for a cluster. See Secrets & encryption for how it is protected.

One binary

Everything above is served by a single self-contained binary. The same helios serve process exposes the HTTP API, the web UI, the MCP server, the AI agent, all ingestion endpoints, and the background workers (compaction, retention, shared-store sync, source polling, and monitor scheduling). Point several instances at one shared store and they converge into a cluster — there is no separate coordinator to deploy.

Putting it together

You want to…The relevant concept
Separate prod from staging dataEnvironment
Group an app's logs for routing/filteringIndex
Query an arbitrary JSON keySchema-on-read dynamic fields
Control who sees whatRBAC in the control plane

Next: see how these pieces are served at runtime in the Architecture overview, or jump straight to the Quickstart.