Collector configuration¶
aris-collector reads a YAML file. Pass it on every command:
aris-collector validate-config --config /etc/aris/collector/collector.yaml
aris-collector run --config /etc/aris/collector/collector.yaml
aris-collector status --config /etc/aris/collector/collector.yaml
aris-collector diag --config /etc/aris/collector/collector.yaml
Package builders and fleet tooling can inspect the stable installation contract with:
The command prints JSON containing the collector service name, default paths, service arguments, install inputs, config precedence, and local state schema policy.
For package-free Linux validation, a built collector binary can be installed with:
Use --destdir <root> to assemble the same layout under an image or test root without touching the live host.
In --destdir mode the script creates paths and modes only; package tooling must apply the owner/group metadata from aris-collector runtime-contract.
Unknown YAML keys are rejected. Empty files can be loaded, but they fail runtime validation because forwarder.core_endpoint is required.
This page focuses on operator outcomes: what each key changes for security, data capture, and performance.
Minimal secure config¶
paths:
state_dir: /var/lib/aris/collector
forwarder:
core_endpoint: core.aris.example.com:8443
mtls:
cert_path: /etc/aris/collector/collector.crt
key_path: /etc/aris/collector/collector.key
server_ca_path: /etc/aris/collector/core-ingest-ca.crt
server_name: core.aris.example.com
management:
core_url: https://core.aris.example.com
server_name: core.aris.example.com
desired_config_hash: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
last_known_good_config_hash: bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
buffer:
max_disk_usage_mb: 500
max_record_age_seconds: 604800
drop_newest_on_full: false
queue_at_risk_bytes_percent: 80
queue_at_risk_oldest_seconds: 86400
Top-level keys¶
| Key | Required | Default | What it does |
|---|---|---|---|
host_id |
no | Secure mode: single DNS or URI SAN from the client certificate. Insecure mode: OS hostname. | Stable host identity stamped onto emitted envelopes. In secure mode, an explicit value must match the client certificate's single DNS or URI SAN identity. |
deployment |
no | see below | Privilege and host-mode guardrails. |
paths |
no | ~/.aris/collector derived paths |
State, queue, audit, and SRE socket locations. |
otel |
no | loopback OTLP/gRPC | Local OTLP receiver listener and optional LAN mTLS. |
forwarder |
yes | none | Core ingest endpoint and client-side mTLS. |
management |
no | heartbeat every 300 seconds when core_url is set |
HTTPS mTLS management calls such as fleet-health heartbeats and automatic renewal scheduling. |
buffer |
no | see below | Durable queue caps, age limits, and local queue-risk thresholds. |
sources |
no | process and OTel on, log off | Source enablement and source-specific knobs. |
sre |
no | raw introspection off, pprof off | Local status and debugging surface. |
audit |
no | 1 MiB rotation threshold | Append-only audit log settings. |
selfmetrics |
no | 5 seconds | Collector process resource sampling. |
debug |
no | off | Debug toggles. |
Required keys and conditional requirements¶
Use this section as the operator checklist for what must be set.
| Condition | Required keys |
|---|---|
| All deployments | forwarder.core_endpoint |
forwarder.insecure: false (default) |
forwarder.mtls.cert_path, forwarder.mtls.key_path, forwarder.mtls.server_ca_path, forwarder.mtls.server_name |
forwarder.insecure: true |
forwarder.core_endpoint must be loopback (127.0.0.1, ::1, localhost, ip6-localhost) |
deployment.allow_root: true |
deployment.allow_root_acknowledged: I-UNDERSTAND-ROOT-MODE |
otel.lan_bind: true |
otel.lan_bind_mtls.enabled: true, plus otel.lan_bind_mtls.cert_path, otel.lan_bind_mtls.key_path, otel.lan_bind_mtls.client_ca_path |
management.core_url is set |
Must be absolute https://...; forwarder.insecure must be false |
management.desired_config_hash or management.last_known_good_config_hash is set |
Value must be a 64-character lowercase SHA-256 hex string |
Security and exposure warnings¶
Treat the following as advanced settings and change-controlled in production.
| Key | Risk if enabled or relaxed | Recommendation |
|---|---|---|
forwarder.insecure |
Disables mTLS to core (transport security downgrade). | Keep false in production. Use only for local loopback development. |
otel.lan_bind |
Exposes collector ingest beyond localhost (larger network attack surface). | Keep false unless you need network OTLP clients. If enabled, require strict mTLS and network ACLs. |
deployment.allow_root |
Collector can observe all-user host activity (larger privacy/compliance blast radius). | Keep false for workstation/user-scope deployments. |
sre.allow_raw_introspection |
Raw payloads can be read from local debug endpoints. | Keep false by default. Enable only for short-lived debugging windows. |
debug.allow_raw_introspection |
Same raw-payload exposure risk as sre.allow_raw_introspection. |
Keep false by default. |
sre.pprof |
Exposes additional profiling/debug endpoints. | Keep false unless actively troubleshooting. |
deployment.windows_admin_access |
Broadens who can access SRE controls on Windows. | Keep false unless explicitly required by your ops model. |
Data-collection scope warnings¶
These settings primarily affect what data is collected (not transport security):
| Key group | Impact |
|---|---|
sources.transcript.* |
Captures local transcript file content; can include user prompts and tool context. |
sources.transcript_json.* |
Captures rewrite-style transcript/session files. |
sources.vscdb.* |
Captures supported IDE chat/store content from local SQLite data sources. |
sources.agent_config.* |
Captures runtime config files for supported agents. |
Use least-privilege source enablement per policy and jurisdiction.
deployment¶
| Key | Default | What it does |
|---|---|---|
allow_root |
false |
Allows Linux/macOS execution as UID 0. Leave off for per-user collectors. |
allow_root_acknowledged |
"" |
Must equal I-UNDERSTAND-ROOT-MODE when allow_root: true. |
windows_admin_access |
false |
Reserved for Windows service deployments. Ignored on Linux/macOS. |
Root mode is a deliberate opt-in because it can observe all users on the host.
paths¶
| Key | Default | What it does |
|---|---|---|
state_dir |
~/.aris/collector |
Collector home directory. |
queue_dir |
$state_dir/queue |
Encrypted SQLite queue location. |
audit_path |
$state_dir/audit.log |
Raw-introspection audit log. |
sre_socket_path |
$state_dir/run/sre.sock |
Unix socket used by aris-collector status. |
The collector creates these directories if needed and rejects group/world-readable directory modes on Unix. Use 0700 or stricter.
forwarder¶
| Key | Required | Default | What it does |
|---|---|---|---|
core_endpoint |
yes | — | host:port for core's ingest gRPC listener. |
insecure |
no | false |
Disables mTLS. Only valid for loopback development endpoints. |
mtls.cert_path |
secure mode | — | Collector client certificate used to authenticate to core. |
mtls.key_path |
secure mode | — | Collector private key used to authenticate to core. |
mtls.server_ca_path |
secure mode | — | CA bundle used to verify core's ingest server certificate. |
mtls.server_name |
secure mode | — | Exact SAN expected on core's server certificate. |
Production collectors should use mTLS so data in transit is mutually authenticated and encrypted.
Insecure mode is for local development only:
management¶
| Key | Required | Default | What it does |
|---|---|---|---|
core_url |
no | disabled | HTTPS management base URL used for fleet-health heartbeats and automatic renewal. |
server_name |
no | URL host | TLS SAN override for the management endpoint. |
heartbeat_interval_seconds |
no | 300 |
Heartbeat cadence. |
desired_config_hash |
no | unset | Optional reporting marker for fleet tooling (64-char lowercase SHA-256). |
last_known_good_config_hash |
no | unset | Optional reporting marker for fleet tooling (64-char lowercase SHA-256). |
Management calls (heartbeats, renewal) require secure mode and reuse forwarder.mtls credentials.
buffer¶
| Key | Default | What it does |
|---|---|---|
max_disk_usage_mb |
100 |
Logical pending/in-flight queue byte cap in MiB; accepted range is 1 through 1,048,576. |
max_record_age_seconds |
604800 |
Drops active queued records older than seven days; accepted range is 1 through 7,776,000. |
drop_newest_on_full |
false |
When false, lossless sources backpressure at the cap; when true, newest records are dropped and counted. |
queue_at_risk_bytes_percent |
80 |
Local and fleet-health risk threshold based on pending queue bytes. |
queue_at_risk_oldest_seconds |
86400 |
Local and fleet-health risk threshold based on oldest queued record age; accepted range is 1 through 2,592,000. |
The default policy is block/backpressure rather than silent loss. drop_oldest is not supported because losing early session records can break downstream reconstruction.
otel¶
| Key | Default | What it does |
|---|---|---|
grpc_addr |
127.0.0.1:4317 |
Local OTLP/gRPC listener. |
http_addr |
127.0.0.1:4318 |
Local OTLP/HTTP listener. |
lan_bind |
false |
Allows binding the OTLP receiver to non-loopback interfaces. Requires lan_bind_mtls.enabled: true. |
lan_bind_mtls.enabled |
false |
Enables mTLS for LAN-bound OTLP clients. |
lan_bind_mtls.cert_path |
"" |
Server certificate for the collector's OTLP receiver. |
lan_bind_mtls.key_path |
"" |
Server key for the collector's OTLP receiver. |
lan_bind_mtls.client_ca_path |
"" |
CA bundle for OTLP client certificates. Required when LAN mTLS is enabled. |
Keep OTLP on loopback for normal workstation deployments.
sources¶
sources:
process:
enabled: true
otel:
enabled: true
transcript:
enabled: false
transcript_json:
enabled: false
vscdb:
enabled: false
agent_config:
enabled: false
| Key | Default | What it does |
|---|---|---|
sources.process.enabled |
true |
Observes same-user AI CLI process lifecycle. |
sources.process.match_name_additions |
[] |
Extra executable basenames for internal AI tools. |
sources.process.match_exe_path_additions |
[] |
Extra executable path substrings. |
sources.process.match_cmdline_additions |
[] |
Extra cmdline substrings (useful for interpreter-launched shims). |
sources.otel.enabled |
true |
Accepts local OTLP logs, metrics, and spans on both gRPC (127.0.0.1:4317) and HTTP (127.0.0.1:4318). |
sources.transcript.enabled |
false |
Enables local JSONL transcript capture. |
sources.transcript.roots |
[] |
Additional transcript roots to watch (additive). |
sources.transcript.backfill_on_start |
false |
Opt-in historical replay for pre-existing transcript files on first startup or after a state wipe. Leave false for live-only collection. |
sources.transcript_json.enabled |
false |
Enables rewrite-style JSON transcript capture. |
sources.transcript_json.roots |
[] |
Additional transcript JSON roots to watch (additive). |
sources.vscdb.enabled |
false |
Enables supported IDE SQLite chat-store and extension-state capture. |
sources.vscdb.backfill_on_start |
false |
Opt-in historical replay for pre-existing IDE SQLite rows on first startup or after a state wipe. Leave false for live-only collection. |
sources.vscdb.poll_interval_seconds |
30 |
Cadence between full target sweeps. |
sources.vscdb.targets |
[] |
Optional additional stores or overrides. See target fields below. |
sources.agent_config.enabled |
false |
Enables runtime config-file capture for supported agents (unix only). |
sources.agent_config.cooldown_seconds |
0 (uses internal default of 5s) |
Per (runtime, uid, cwd) trigger cooldown. |
sources.agent_config.per_file_bytes |
0 (uses internal default of 1 MiB) |
Per-file read cap for captured config files. |
sources.agent_config.per_snapshot_bytes |
0 (uses internal default of 5 MiB) |
Total bytes cap per snapshot run. |
Transcript Data And Performance Impact¶
Transcript sources watch configured roots and collect new supported records while the collector is running. Enabling transcript sources increases local disk reads and outbound ingest volume in proportion to the amount of new transcript activity on the endpoint.
Use the queue risk settings under buffer to alert on sustained local backlog. For resource-constrained endpoints, enable only the source types needed for the deployment.
sources.vscdb¶
Enable sources.vscdb when you want IDE/chat surfaces that store data in local SQLite databases.
| Setting | Required for these surfaces | Not required for |
|---|---|---|
sources.vscdb.enabled: true |
Cursor IDE chat history, supported VS Code AI extension state, and other supported IDE SQLite chat stores | OTLP telemetry (sources.otel), process lifecycle (sources.process), JSONL transcript capture (sources.transcript) |
For most deployments, vscdb.enabled: true is enough and no explicit targets are needed.
Internal default targets are auto-discovered by the collector. This includes standard Cursor IDE and VS Code user-data locations for the user running the collector.
Adding non-canonical stores¶
If you have a custom IDE chat store, add an explicit target. The same canonical entries continue to auto-apply alongside it; absent canonical stores are silently skipped:
sources:
vscdb:
enabled: true
targets:
- label: my-store
db_path: ~/path/to/state.vscdb
table: ItemTable
db_path accepts ~ and $VAR expansion; the result must resolve to an absolute path. Use db_path_glob instead of db_path for stores that live under a wildcard path (e.g. per-workspace databases). Exactly one of db_path / db_path_glob is required per non-preset target.
sources.vscdb.targets[*] field reference¶
| Key | Required | Default | What it does |
|---|---|---|---|
preset |
no | "" |
Built-in preset name. If set, do not also set label, db_path, db_path_glob, or table. |
label |
yes for non-preset targets | — | Target name shown in emitted payload metadata. |
db_path |
exactly one of db_path or db_path_glob for non-preset targets |
— | Absolute SQLite path (supports ~ and $VAR). |
db_path_glob |
exactly one of db_path or db_path_glob for non-preset targets |
— | Absolute glob for matching one or more SQLite files. |
table |
yes for non-preset targets | — | Table to read from. |
key_column |
no | key |
Key column name. |
value_column |
no | value |
Value column name. |
key_prefixes |
no | all keys | Optional list of row-key prefixes to include. Use this to narrow broad stores such as VS Code ItemTable. |
Overriding a canonical preset¶
To override a built-in entry, define a target with the same label and your desired path/table/columns.
sre, audit, selfmetrics, and debug¶
| Key | Default | What it does |
|---|---|---|
sre.allow_raw_introspection |
false |
Allows raw payload retrieval from local debug endpoints when audit logging succeeds. |
sre.pprof |
false |
Enables local pprof endpoints. |
audit.max_bytes |
1048576 |
Audit log rotation threshold. |
selfmetrics.interval_seconds |
5 |
CPU/RSS/goroutine sampling interval. |
debug.allow_raw_introspection |
false |
Alias-style raw-introspection toggle used by the supervisor. |
Most operators should leave raw introspection off. aris-collector status does not need raw payload access.
SIGHUP and restarts¶
SIGHUP is validation-only in v1. It checks config syntax/compatibility and keeps the running config unchanged.
To apply config changes, restart the collector.