Production hardening¶
The single-host Docker Compose install is intended for evaluation and small deployments. When you put ARIS in front of real operators, the items below matter.
TLS¶
Operators sign in through Google OIDC; their session cookies, the bridge token web sends to core, and the OIDC redirect must all flow over HTTPS. Run a TLS-terminating reverse proxy (nginx, Caddy, or your cloud load balancer) in front of web and update:
| Setting | Value |
|---|---|
NEXTAUTH_URL |
https://aris.your-domain.com |
| Google OAuth client redirect URI | https://aris.your-domain.com/api/auth/callback/google |
NextAuth automatically switches state cookies to the __Secure- prefix when NEXTAUTH_URL starts with https://.
core itself does not need to be reachable from the public internet — only web does. In Docker Compose terms, you can drop the 8080:8080 port mapping for core once you're confident the wiring works. Web reaches core over the compose network.
Collector ingest is separate from the web/API port. If you deploy collectors, expose ARIS_INGEST_GRPC_ADDR only on a private address or private load balancer reachable from the network segments where collector hosts run. Ingest requires mTLS: core verifies collector client certificates against ARIS_INGEST_CLIENT_CA_FILE, and collectors pin core's server CA plus exact server name.
The current collector install path uses manually provisioned mTLS certificates for evaluation and small pilots. For broad rollout, plan for per-device identity, bounded bootstrap credentials, certificate renewal and revocation, and fleet health monitoring. See Enterprise collector deployment.
Secret management¶
The development workflow puts secrets in .env. For production, use your secret manager:
- Kubernetes: render
.envfrom aSecretand project it as environment variables, or use the_FILEform on each variable (e.g.ARIS_AUTH_TOKEN_SECRET_FILE) and project secrets as files. - Docker Swarm / Compose: Docker secrets, mounted into both containers.
- Cloud-native: each cloud's KMS-backed env injection (Secrets Manager → ECS task env, GCP Secret Manager → Cloud Run, etc.).
ARIS reads <NAME>_FILE in preference to <NAME> for any secret-bearing variable; the file form is the recommended path.
Database¶
The Compose-bundled Postgres is fine for evaluation. For production:
- Use a managed Postgres 16+ instance (Cloud SQL, RDS, Aiven, etc.).
- Point
ARIS_DATABASE_URLat it. - Drop the
dbservice from your compose / k8s manifests. - Enable point-in-time backups on the managed instance — ARIS's identity tables are the system of record for who-is-who in the platform.
ARIS expects sslmode=require in production:
The container does not validate SSL mode itself; the Postgres driver does, on first connection.
Sync interval¶
The default ARIS_DIRECTORY_SYNC_INTERVAL=1h is the worst-case role-staleness window. If you remove a user from aris-admin@... in Workspace, ARIS continues to grant admin scope to that user for up to one hour after the change.
For most Workspaces, leave this at 1h. Tighten to 15m if you have stricter revocation requirements; loosen to 4h if you have a very large directory and the API quota matters. Sub-1-minute intervals waste API quota and produce no security benefit (the bridge token TTL is 5 minutes — anyone holding a leaked token already has a 5-minute window before re-auth).
Backups¶
What's worth backing up:
- The Postgres database. Identity tables (
person,authentication_identity,employment_state,staged_identity). Everything else is rebuildable from the next sync. - The
.envfile (in your secret manager, not in version control). RecoveringARIS_TENANT_IDafter losing it without a backup means re-bootstrapping the entire identity table. - The Google service-account JSON. The same file can be regenerated from the Google Cloud console, but it's worth keeping a copy in a sealed secret store.
- Collector CA and issued host certificates. Losing the CA means re-enrolling collectors. Losing one host key means rotating that host's collector certificate. The current ARIS ingest path does not enforce per-certificate revocation itself; immediate invalidation requires customer-enforced revocation at the TLS/PKI layer, an external proxy, or collector client CA rotation. Do not reuse one collector certificate across many hosts.
Service account keys, OAuth client secrets, and HMAC signing secrets should all be rotatable. ARIS reads them from env on boot — rotation is a config update + restart, not a code change.
Scaling¶
The MVP is single-instance per customer deployment. There is no horizontal-scale story for core today — running two instances against the same database would result in the directory sync running twice on the same cadence.
Web is stateless once you've handed it NEXTAUTH_SECRET; you can run multiple replicas behind a load balancer. Pin sticky sessions or rely on the JWT-strategy session — there is no shared session store to coordinate.
Collectors are independent per host. They queue locally when core is down and resume forwarding after reconnect.
For fleets of 100 or more hosts, validate reconnect behavior, queue growth, certificate rotation, stale-host detection, and package rollback in staged rollout rings before broad deployment.
Logs¶
Both containers emit structured JSON logs to stdout. Pipe them into your log aggregator the same way you do every other 12-factor app. The log fields most useful in alerting:
event=auth.deny— every middleware rejection. Alert on a rate spike.event=directory.sync.run— every sync run. Alert onerrors > 0or on the absence of a run within2 × ARIS_DIRECTORY_SYNC_INTERVAL.
Outbound network policy¶
If you run with strict egress rules, the only outbound destinations ARIS needs are:
| Host | Purpose |
|---|---|
accounts.google.com |
Operator OIDC login (web only). |
oauth2.googleapis.com |
OIDC token exchange (web), service-account token exchange (core). |
admin.googleapis.com |
Admin Directory API (core only). |
www.googleapis.com |
OIDC discovery + JWKS (web). |
| your core ingest endpoint | Collector envelope forwarding (aris-collector only). |
Nothing else. ARIS does not phone home, does not push telemetry to Ryora, and does not pull updates at runtime.