Collector configuration¶

aris-collector reads a YAML file. Pass it on every command:

aris-collector validate-config --config /etc/aris/collector/collector.yaml
aris-collector run --config /etc/aris/collector/collector.yaml
aris-collector status --config /etc/aris/collector/collector.yaml
aris-collector diag --config /etc/aris/collector/collector.yaml

Package builders and fleet tooling can inspect the stable installation contract with:

aris-collector runtime-contract

The command prints JSON containing the collector service name, default paths, service arguments, install inputs, config precedence, and local state schema policy.

For package-free Linux validation, a built collector binary can be installed with:

scripts/install-collector-local --binary core/bin/aris-collector

Use --destdir <root> to assemble the same layout under an image or test root without touching the live host. In --destdir mode the script creates paths and modes only; package tooling must apply the owner/group metadata from aris-collector runtime-contract.

Unknown YAML keys are rejected. Empty files can be loaded, but they fail runtime validation because forwarder.core_endpoint is required.

This page focuses on operator outcomes: what each key changes for security, data capture, and performance.

Minimal secure config¶

paths:
  state_dir: /var/lib/aris/collector

forwarder:
  core_endpoint: core.aris.example.com:8443
  mtls:
    cert_path: /etc/aris/collector/collector.crt
    key_path: /etc/aris/collector/collector.key
    server_ca_path: /etc/aris/collector/core-ingest-ca.crt
    server_name: core.aris.example.com

management:
  core_url: https://core.aris.example.com
  server_name: core.aris.example.com
  desired_config_hash: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
  last_known_good_config_hash: bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

buffer:
  max_disk_usage_mb: 500
  max_record_age_seconds: 604800
  drop_newest_on_full: false
  queue_at_risk_bytes_percent: 80
  queue_at_risk_oldest_seconds: 86400

Top-level keys¶

Key	Required	Default	What it does
`host_id`	no	Secure mode: single DNS or URI SAN from the client certificate. Insecure mode: OS hostname.	Stable host identity stamped onto emitted envelopes. In secure mode, an explicit value must match the client certificate's single DNS or URI SAN identity.
`deployment`	no	see below	Privilege and host-mode guardrails.
`paths`	no	`~/.aris/collector` derived paths	State, queue, audit, and SRE socket locations.
`otel`	no	loopback OTLP/gRPC	Local OTLP receiver listener and optional LAN mTLS.
`forwarder`	yes	none	Core ingest endpoint and client-side mTLS.
`management`	no	heartbeat every 300 seconds when `core_url` is set	HTTPS mTLS management calls such as fleet-health heartbeats and automatic renewal scheduling.
`buffer`	no	see below	Durable queue caps, age limits, and local queue-risk thresholds.
`sources`	no	process and OTel on, log off	Source enablement and source-specific knobs.
`sre`	no	raw introspection off, pprof off	Local status and debugging surface.
`audit`	no	1 MiB rotation threshold	Append-only audit log settings.
`selfmetrics`	no	5 seconds	Collector process resource sampling.
`debug`	no	off	Debug toggles.

Required keys and conditional requirements¶

Use this section as the operator checklist for what must be set.

Condition	Required keys
All deployments	`forwarder.core_endpoint`
`forwarder.insecure: false` (default)	`forwarder.mtls.cert_path`, `forwarder.mtls.key_path`, `forwarder.mtls.server_ca_path`, `forwarder.mtls.server_name`
`forwarder.insecure: true`	`forwarder.core_endpoint` must be loopback (`127.0.0.1`, `::1`, `localhost`, `ip6-localhost`)
`deployment.allow_root: true`	`deployment.allow_root_acknowledged: I-UNDERSTAND-ROOT-MODE`
`otel.lan_bind: true`	`otel.lan_bind_mtls.enabled: true`, plus `otel.lan_bind_mtls.cert_path`, `otel.lan_bind_mtls.key_path`, `otel.lan_bind_mtls.client_ca_path`
`management.core_url` is set	Must be absolute `https://...`; `forwarder.insecure` must be `false`
`management.desired_config_hash` or `management.last_known_good_config_hash` is set	Value must be a 64-character lowercase SHA-256 hex string

Security and exposure warnings¶

Treat the following as advanced settings and change-controlled in production.

Key	Risk if enabled or relaxed	Recommendation
`forwarder.insecure`	Disables mTLS to core (transport security downgrade).	Keep `false` in production. Use only for local loopback development.
`otel.lan_bind`	Exposes collector ingest beyond localhost (larger network attack surface).	Keep `false` unless you need network OTLP clients. If enabled, require strict mTLS and network ACLs.
`deployment.allow_root`	Collector can observe all-user host activity (larger privacy/compliance blast radius).	Keep `false` for workstation/user-scope deployments.
`sre.allow_raw_introspection`	Raw payloads can be read from local debug endpoints.	Keep `false` by default. Enable only for short-lived debugging windows.
`debug.allow_raw_introspection`	Same raw-payload exposure risk as `sre.allow_raw_introspection`.	Keep `false` by default.
`sre.pprof`	Exposes additional profiling/debug endpoints.	Keep `false` unless actively troubleshooting.
`deployment.windows_admin_access`	Broadens who can access SRE controls on Windows.	Keep `false` unless explicitly required by your ops model.

Data-collection scope warnings¶

These settings primarily affect what data is collected (not transport security):

Key group	Impact
`sources.transcript.*`	Captures local transcript file content; can include user prompts and tool context.
`sources.transcript_json.*`	Captures rewrite-style transcript/session files.
`sources.vscdb.*`	Captures supported IDE chat/store content from local SQLite data sources.
`sources.agent_config.*`	Captures runtime config files for supported agents.

Use least-privilege source enablement per policy and jurisdiction.

`deployment`¶

Key	Default	What it does
`allow_root`	`false`	Allows Linux/macOS execution as UID 0. Leave off for per-user collectors.
`allow_root_acknowledged`	`""`	Must equal `I-UNDERSTAND-ROOT-MODE` when `allow_root: true`.
`windows_admin_access`	`false`	Reserved for Windows service deployments. Ignored on Linux/macOS.

Root mode is a deliberate opt-in because it can observe all users on the host.

`paths`¶

Key	Default	What it does
`state_dir`	`~/.aris/collector`	Collector home directory.
`queue_dir`	`$state_dir/queue`	Encrypted SQLite queue location.
`audit_path`	`$state_dir/audit.log`	Raw-introspection audit log.
`sre_socket_path`	`$state_dir/run/sre.sock`	Unix socket used by `aris-collector status`.

The collector creates these directories if needed and rejects group/world-readable directory modes on Unix. Use 0700 or stricter.

`forwarder`¶

Key	Required	Default	What it does
`core_endpoint`	yes	—	`host:port` for core's ingest gRPC listener.
`insecure`	no	`false`	Disables mTLS. Only valid for loopback development endpoints.
`mtls.cert_path`	secure mode	—	Collector client certificate used to authenticate to core.
`mtls.key_path`	secure mode	—	Collector private key used to authenticate to core.
`mtls.server_ca_path`	secure mode	—	CA bundle used to verify core's ingest server certificate.
`mtls.server_name`	secure mode	—	Exact SAN expected on core's server certificate.

Production collectors should use mTLS so data in transit is mutually authenticated and encrypted.

Insecure mode is for local development only:

host_id: dev-laptop
forwarder:
  core_endpoint: 127.0.0.1:8443
  insecure: true

`management`¶

Key	Required	Default	What it does
`core_url`	no	disabled	HTTPS management base URL used for fleet-health heartbeats and automatic renewal.
`server_name`	no	URL host	TLS SAN override for the management endpoint.
`heartbeat_interval_seconds`	no	`300`	Heartbeat cadence.
`desired_config_hash`	no	unset	Optional reporting marker for fleet tooling (64-char lowercase SHA-256).
`last_known_good_config_hash`	no	unset	Optional reporting marker for fleet tooling (64-char lowercase SHA-256).

Management calls (heartbeats, renewal) require secure mode and reuse forwarder.mtls credentials.

`buffer`¶

Key	Default	What it does
`max_disk_usage_mb`	`100`	Logical pending/in-flight queue byte cap in MiB; accepted range is 1 through 1,048,576.
`max_record_age_seconds`	`604800`	Drops active queued records older than seven days; accepted range is 1 through 7,776,000.
`drop_newest_on_full`	`false`	When false, lossless sources backpressure at the cap; when true, newest records are dropped and counted.
`queue_at_risk_bytes_percent`	`80`	Local and fleet-health risk threshold based on pending queue bytes.
`queue_at_risk_oldest_seconds`	`86400`	Local and fleet-health risk threshold based on oldest queued record age; accepted range is 1 through 2,592,000.

The default policy is block/backpressure rather than silent loss. drop_oldest is not supported because losing early session records can break downstream reconstruction.

`otel`¶

Key	Default	What it does
`grpc_addr`	`127.0.0.1:4317`	Local OTLP/gRPC listener.
`http_addr`	`127.0.0.1:4318`	Local OTLP/HTTP listener.
`lan_bind`	`false`	Allows binding the OTLP receiver to non-loopback interfaces. Requires `lan_bind_mtls.enabled: true`.
`lan_bind_mtls.enabled`	`false`	Enables mTLS for LAN-bound OTLP clients.
`lan_bind_mtls.cert_path`	`""`	Server certificate for the collector's OTLP receiver.
`lan_bind_mtls.key_path`	`""`	Server key for the collector's OTLP receiver.
`lan_bind_mtls.client_ca_path`	`""`	CA bundle for OTLP client certificates. Required when LAN mTLS is enabled.

Keep OTLP on loopback for normal workstation deployments.

`sources`¶

sources:
  process:
    enabled: true
  otel:
    enabled: true
  transcript:
    enabled: false
  transcript_json:
    enabled: false
  vscdb:
    enabled: false
  agent_config:
    enabled: false

Key	Default	What it does
`sources.process.enabled`	`true`	Observes same-user AI CLI process lifecycle.
`sources.process.match_name_additions`	`[]`	Extra executable basenames for internal AI tools.
`sources.process.match_exe_path_additions`	`[]`	Extra executable path substrings.
`sources.process.match_cmdline_additions`	`[]`	Extra cmdline substrings (useful for interpreter-launched shims).
`sources.otel.enabled`	`true`	Accepts local OTLP logs, metrics, and spans on both gRPC (`127.0.0.1:4317`) and HTTP (`127.0.0.1:4318`).
`sources.transcript.enabled`	`false`	Enables local JSONL transcript capture.
`sources.transcript.roots`	`[]`	Additional transcript roots to watch (additive).
`sources.transcript.backfill_on_start`	`false`	Opt-in historical replay for pre-existing transcript files on first startup or after a state wipe. Leave false for live-only collection.
`sources.transcript_json.enabled`	`false`	Enables rewrite-style JSON transcript capture.
`sources.transcript_json.roots`	`[]`	Additional transcript JSON roots to watch (additive).
`sources.vscdb.enabled`	`false`	Enables supported IDE SQLite chat-store and extension-state capture.
`sources.vscdb.backfill_on_start`	`false`	Opt-in historical replay for pre-existing IDE SQLite rows on first startup or after a state wipe. Leave false for live-only collection.
`sources.vscdb.poll_interval_seconds`	`30`	Cadence between full target sweeps.
`sources.vscdb.targets`	`[]`	Optional additional stores or overrides. See target fields below.
`sources.agent_config.enabled`	`false`	Enables runtime config-file capture for supported agents (unix only).
`sources.agent_config.cooldown_seconds`	`0` (uses internal default of 5s)	Per `(runtime, uid, cwd)` trigger cooldown.
`sources.agent_config.per_file_bytes`	`0` (uses internal default of 1 MiB)	Per-file read cap for captured config files.
`sources.agent_config.per_snapshot_bytes`	`0` (uses internal default of 5 MiB)	Total bytes cap per snapshot run.

Transcript Data And Performance Impact¶

Transcript sources watch configured roots and collect new supported records while the collector is running. Enabling transcript sources increases local disk reads and outbound ingest volume in proportion to the amount of new transcript activity on the endpoint.

Use the queue risk settings under buffer to alert on sustained local backlog. For resource-constrained endpoints, enable only the source types needed for the deployment.

`sources.vscdb`¶

Enable sources.vscdb when you want IDE/chat surfaces that store data in local SQLite databases.

Setting	Required for these surfaces	Not required for
`sources.vscdb.enabled: true`	Cursor IDE chat history, supported VS Code AI extension state, and other supported IDE SQLite chat stores	OTLP telemetry (`sources.otel`), process lifecycle (`sources.process`), JSONL transcript capture (`sources.transcript`)

For most deployments, vscdb.enabled: true is enough and no explicit targets are needed.

sources:
  vscdb:
    enabled: true

Internal default targets are auto-discovered by the collector. This includes standard Cursor IDE and VS Code user-data locations for the user running the collector.

Adding non-canonical stores¶

If you have a custom IDE chat store, add an explicit target. The same canonical entries continue to auto-apply alongside it; absent canonical stores are silently skipped:

sources:
  vscdb:
    enabled: true
    targets:
      - label: my-store
        db_path: ~/path/to/state.vscdb
        table: ItemTable

db_path accepts ~ and $VAR expansion; the result must resolve to an absolute path. Use db_path_glob instead of db_path for stores that live under a wildcard path (e.g. per-workspace databases). Exactly one of db_path / db_path_glob is required per non-preset target.

`sources.vscdb.targets[*]` field reference¶

Key	Required	Default	What it does
`preset`	no	`""`	Built-in preset name. If set, do not also set `label`, `db_path`, `db_path_glob`, or `table`.
`label`	yes for non-preset targets	—	Target name shown in emitted payload metadata.
`db_path`	exactly one of `db_path` or `db_path_glob` for non-preset targets	—	Absolute SQLite path (supports `~` and `$VAR`).
`db_path_glob`	exactly one of `db_path` or `db_path_glob` for non-preset targets	—	Absolute glob for matching one or more SQLite files.
`table`	yes for non-preset targets	—	Table to read from.
`key_column`	no	`key`	Key column name.
`value_column`	no	`value`	Value column name.
`key_prefixes`	no	all keys	Optional list of row-key prefixes to include. Use this to narrow broad stores such as VS Code `ItemTable`.

Overriding a canonical preset¶

To override a built-in entry, define a target with the same label and your desired path/table/columns.

`sre`, `audit`, `selfmetrics`, and `debug`¶

Key	Default	What it does
`sre.allow_raw_introspection`	`false`	Allows raw payload retrieval from local debug endpoints when audit logging succeeds.
`sre.pprof`	`false`	Enables local `pprof` endpoints.
`audit.max_bytes`	`1048576`	Audit log rotation threshold.
`selfmetrics.interval_seconds`	`5`	CPU/RSS/goroutine sampling interval.
`debug.allow_raw_introspection`	`false`	Alias-style raw-introspection toggle used by the supervisor.

Most operators should leave raw introspection off. aris-collector status does not need raw payload access.

SIGHUP and restarts¶

SIGHUP is validation-only in v1. It checks config syntax/compatibility and keeps the running config unchanged.

To apply config changes, restart the collector.

Collector configuration¶

Minimal secure config¶

Top-level keys¶

Required keys and conditional requirements¶

Security and exposure warnings¶

Data-collection scope warnings¶

deployment¶

paths¶

forwarder¶

management¶

buffer¶

otel¶

sources¶