Skip to content

Collector configuration

aris-collector reads a YAML file. Pass it on every command:

aris-collector validate-config --config /etc/aris/collector/collector.yaml
aris-collector run --config /etc/aris/collector/collector.yaml
aris-collector status --config /etc/aris/collector/collector.yaml
aris-collector diag --config /etc/aris/collector/collector.yaml

Package builders and fleet tooling can inspect the stable installation contract with:

aris-collector runtime-contract

The command prints JSON containing the collector service name, default paths, service arguments, install inputs, config precedence, and local state schema policy.

For package-free Linux validation, a built collector binary can be installed with:

scripts/install-collector-local --binary core/bin/aris-collector

Use --destdir <root> to assemble the same layout under an image or test root without touching the live host. In --destdir mode the script creates paths and modes only; package tooling must apply the owner/group metadata from aris-collector runtime-contract.

Unknown YAML keys are rejected. Empty files can be loaded, but they fail runtime validation because forwarder.core_endpoint is required.

This page focuses on operator outcomes: what each key changes for security, data capture, and performance.

Minimal secure config

paths:
  state_dir: /var/lib/aris/collector

forwarder:
  core_endpoint: core.aris.example.com:8443
  mtls:
    cert_path: /etc/aris/collector/collector.crt
    key_path: /etc/aris/collector/collector.key
    server_ca_path: /etc/aris/collector/core-ingest-ca.crt
    server_name: core.aris.example.com

management:
  core_url: https://core.aris.example.com
  server_name: core.aris.example.com
  desired_config_hash: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
  last_known_good_config_hash: bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

buffer:
  max_disk_usage_mb: 500
  max_record_age_seconds: 604800
  drop_newest_on_full: false
  queue_at_risk_bytes_percent: 80
  queue_at_risk_oldest_seconds: 86400

Top-level keys

Key Required Default What it does
host_id no Secure mode: single DNS or URI SAN from the client certificate. Insecure mode: OS hostname. Stable host identity stamped onto emitted envelopes. In secure mode, an explicit value must match the client certificate's single DNS or URI SAN identity.
deployment no see below Privilege and host-mode guardrails.
paths no ~/.aris/collector derived paths State, queue, audit, and SRE socket locations.
otel no loopback OTLP/gRPC Local OTLP receiver listener and optional LAN mTLS.
forwarder yes none Core ingest endpoint and client-side mTLS.
management no heartbeat every 300 seconds when core_url is set HTTPS mTLS management calls such as fleet-health heartbeats and automatic renewal scheduling.
buffer no see below Durable queue caps, age limits, and local queue-risk thresholds.
sources no process and OTel on, log off Source enablement and source-specific knobs.
sre no raw introspection off, pprof off Local status and debugging surface.
audit no 1 MiB rotation threshold Append-only audit log settings.
selfmetrics no 5 seconds Collector process resource sampling.
debug no off Debug toggles.

Required keys and conditional requirements

Use this section as the operator checklist for what must be set.

Condition Required keys
All deployments forwarder.core_endpoint
forwarder.insecure: false (default) forwarder.mtls.cert_path, forwarder.mtls.key_path, forwarder.mtls.server_ca_path, forwarder.mtls.server_name
forwarder.insecure: true forwarder.core_endpoint must be loopback (127.0.0.1, ::1, localhost, ip6-localhost)
deployment.allow_root: true deployment.allow_root_acknowledged: I-UNDERSTAND-ROOT-MODE
otel.lan_bind: true otel.lan_bind_mtls.enabled: true, plus otel.lan_bind_mtls.cert_path, otel.lan_bind_mtls.key_path, otel.lan_bind_mtls.client_ca_path
management.core_url is set Must be absolute https://...; forwarder.insecure must be false
management.desired_config_hash or management.last_known_good_config_hash is set Value must be a 64-character lowercase SHA-256 hex string

Security and exposure warnings

Treat the following as advanced settings and change-controlled in production.

Key Risk if enabled or relaxed Recommendation
forwarder.insecure Disables mTLS to core (transport security downgrade). Keep false in production. Use only for local loopback development.
otel.lan_bind Exposes collector ingest beyond localhost (larger network attack surface). Keep false unless you need network OTLP clients. If enabled, require strict mTLS and network ACLs.
deployment.allow_root Collector can observe all-user host activity (larger privacy/compliance blast radius). Keep false for workstation/user-scope deployments.
sre.allow_raw_introspection Raw payloads can be read from local debug endpoints. Keep false by default. Enable only for short-lived debugging windows.
debug.allow_raw_introspection Same raw-payload exposure risk as sre.allow_raw_introspection. Keep false by default.
sre.pprof Exposes additional profiling/debug endpoints. Keep false unless actively troubleshooting.
deployment.windows_admin_access Broadens who can access SRE controls on Windows. Keep false unless explicitly required by your ops model.

Data-collection scope warnings

These settings primarily affect what data is collected (not transport security):

Key group Impact
sources.transcript.* Captures local transcript file content; can include user prompts and tool context.
sources.transcript_json.* Captures rewrite-style transcript/session files.
sources.vscdb.* Captures supported IDE chat/store content from local SQLite data sources.
sources.agent_config.* Captures runtime config files for supported agents.

Use least-privilege source enablement per policy and jurisdiction.

deployment

Key Default What it does
allow_root false Allows Linux/macOS execution as UID 0. Leave off for per-user collectors.
allow_root_acknowledged "" Must equal I-UNDERSTAND-ROOT-MODE when allow_root: true.
windows_admin_access false Reserved for Windows service deployments. Ignored on Linux/macOS.

Root mode is a deliberate opt-in because it can observe all users on the host.

paths

Key Default What it does
state_dir ~/.aris/collector Collector home directory.
queue_dir $state_dir/queue Encrypted SQLite queue location.
audit_path $state_dir/audit.log Raw-introspection audit log.
sre_socket_path $state_dir/run/sre.sock Unix socket used by aris-collector status.

The collector creates these directories if needed and rejects group/world-readable directory modes on Unix. Use 0700 or stricter.

forwarder

Key Required Default What it does
core_endpoint yes host:port for core's ingest gRPC listener.
insecure no false Disables mTLS. Only valid for loopback development endpoints.
mtls.cert_path secure mode Collector client certificate used to authenticate to core.
mtls.key_path secure mode Collector private key used to authenticate to core.
mtls.server_ca_path secure mode CA bundle used to verify core's ingest server certificate.
mtls.server_name secure mode Exact SAN expected on core's server certificate.

Production collectors should use mTLS so data in transit is mutually authenticated and encrypted.

Insecure mode is for local development only:

host_id: dev-laptop
forwarder:
  core_endpoint: 127.0.0.1:8443
  insecure: true

management

Key Required Default What it does
core_url no disabled HTTPS management base URL used for fleet-health heartbeats and automatic renewal.
server_name no URL host TLS SAN override for the management endpoint.
heartbeat_interval_seconds no 300 Heartbeat cadence.
desired_config_hash no unset Optional reporting marker for fleet tooling (64-char lowercase SHA-256).
last_known_good_config_hash no unset Optional reporting marker for fleet tooling (64-char lowercase SHA-256).

Management calls (heartbeats, renewal) require secure mode and reuse forwarder.mtls credentials.

buffer

Key Default What it does
max_disk_usage_mb 100 Logical pending/in-flight queue byte cap in MiB; accepted range is 1 through 1,048,576.
max_record_age_seconds 604800 Drops active queued records older than seven days; accepted range is 1 through 7,776,000.
drop_newest_on_full false When false, lossless sources backpressure at the cap; when true, newest records are dropped and counted.
queue_at_risk_bytes_percent 80 Local and fleet-health risk threshold based on pending queue bytes.
queue_at_risk_oldest_seconds 86400 Local and fleet-health risk threshold based on oldest queued record age; accepted range is 1 through 2,592,000.

The default policy is block/backpressure rather than silent loss. drop_oldest is not supported because losing early session records can break downstream reconstruction.

otel

Key Default What it does
grpc_addr 127.0.0.1:4317 Local OTLP/gRPC listener.
http_addr 127.0.0.1:4318 Local OTLP/HTTP listener.
lan_bind false Allows binding the OTLP receiver to non-loopback interfaces. Requires lan_bind_mtls.enabled: true.
lan_bind_mtls.enabled false Enables mTLS for LAN-bound OTLP clients.
lan_bind_mtls.cert_path "" Server certificate for the collector's OTLP receiver.
lan_bind_mtls.key_path "" Server key for the collector's OTLP receiver.
lan_bind_mtls.client_ca_path "" CA bundle for OTLP client certificates. Required when LAN mTLS is enabled.

Keep OTLP on loopback for normal workstation deployments.

sources

sources:
  process:
    enabled: true
  otel:
    enabled: true
  transcript:
    enabled: false
  transcript_json:
    enabled: false
  vscdb:
    enabled: false
  agent_config:
    enabled: false
Key Default What it does
sources.process.enabled true Observes same-user AI CLI process lifecycle.
sources.process.match_name_additions [] Extra executable basenames for internal AI tools.
sources.process.match_exe_path_additions [] Extra executable path substrings.
sources.process.match_cmdline_additions [] Extra cmdline substrings (useful for interpreter-launched shims).
sources.otel.enabled true Accepts local OTLP logs, metrics, and spans on both gRPC (127.0.0.1:4317) and HTTP (127.0.0.1:4318).
sources.transcript.enabled false Enables local JSONL transcript capture.
sources.transcript.roots [] Additional transcript roots to watch (additive).
sources.transcript.backfill_on_start false Opt-in historical replay for pre-existing transcript files on first startup or after a state wipe. Leave false for live-only collection.
sources.transcript_json.enabled false Enables rewrite-style JSON transcript capture.
sources.transcript_json.roots [] Additional transcript JSON roots to watch (additive).
sources.vscdb.enabled false Enables supported IDE SQLite chat-store and extension-state capture.
sources.vscdb.backfill_on_start false Opt-in historical replay for pre-existing IDE SQLite rows on first startup or after a state wipe. Leave false for live-only collection.
sources.vscdb.poll_interval_seconds 30 Cadence between full target sweeps.
sources.vscdb.targets [] Optional additional stores or overrides. See target fields below.
sources.agent_config.enabled false Enables runtime config-file capture for supported agents (unix only).
sources.agent_config.cooldown_seconds 0 (uses internal default of 5s) Per (runtime, uid, cwd) trigger cooldown.
sources.agent_config.per_file_bytes 0 (uses internal default of 1 MiB) Per-file read cap for captured config files.
sources.agent_config.per_snapshot_bytes 0 (uses internal default of 5 MiB) Total bytes cap per snapshot run.

Transcript Data And Performance Impact

Transcript sources watch configured roots and collect new supported records while the collector is running. Enabling transcript sources increases local disk reads and outbound ingest volume in proportion to the amount of new transcript activity on the endpoint.

Use the queue risk settings under buffer to alert on sustained local backlog. For resource-constrained endpoints, enable only the source types needed for the deployment.

sources.vscdb

Enable sources.vscdb when you want IDE/chat surfaces that store data in local SQLite databases.

Setting Required for these surfaces Not required for
sources.vscdb.enabled: true Cursor IDE chat history, supported VS Code AI extension state, and other supported IDE SQLite chat stores OTLP telemetry (sources.otel), process lifecycle (sources.process), JSONL transcript capture (sources.transcript)

For most deployments, vscdb.enabled: true is enough and no explicit targets are needed.

sources:
  vscdb:
    enabled: true

Internal default targets are auto-discovered by the collector. This includes standard Cursor IDE and VS Code user-data locations for the user running the collector.

Adding non-canonical stores

If you have a custom IDE chat store, add an explicit target. The same canonical entries continue to auto-apply alongside it; absent canonical stores are silently skipped:

sources:
  vscdb:
    enabled: true
    targets:
      - label: my-store
        db_path: ~/path/to/state.vscdb
        table: ItemTable

db_path accepts ~ and $VAR expansion; the result must resolve to an absolute path. Use db_path_glob instead of db_path for stores that live under a wildcard path (e.g. per-workspace databases). Exactly one of db_path / db_path_glob is required per non-preset target.

sources.vscdb.targets[*] field reference

Key Required Default What it does
preset no "" Built-in preset name. If set, do not also set label, db_path, db_path_glob, or table.
label yes for non-preset targets Target name shown in emitted payload metadata.
db_path exactly one of db_path or db_path_glob for non-preset targets Absolute SQLite path (supports ~ and $VAR).
db_path_glob exactly one of db_path or db_path_glob for non-preset targets Absolute glob for matching one or more SQLite files.
table yes for non-preset targets Table to read from.
key_column no key Key column name.
value_column no value Value column name.
key_prefixes no all keys Optional list of row-key prefixes to include. Use this to narrow broad stores such as VS Code ItemTable.

Overriding a canonical preset

To override a built-in entry, define a target with the same label and your desired path/table/columns.

sre, audit, selfmetrics, and debug

Key Default What it does
sre.allow_raw_introspection false Allows raw payload retrieval from local debug endpoints when audit logging succeeds.
sre.pprof false Enables local pprof endpoints.
audit.max_bytes 1048576 Audit log rotation threshold.
selfmetrics.interval_seconds 5 CPU/RSS/goroutine sampling interval.
debug.allow_raw_introspection false Alias-style raw-introspection toggle used by the supervisor.

Most operators should leave raw introspection off. aris-collector status does not need raw payload access.

SIGHUP and restarts

SIGHUP is validation-only in v1. It checks config syntax/compatibility and keeps the running config unchanged.

To apply config changes, restart the collector.