Access Control (ACL)¶
DocBrain enforces source-system access controls at query time. When a user asks a question, results are filtered to only the content that user is authorised to read in the source system (Confluence, Slack, GitHub, Jira) — regardless of how DocBrain itself is configured.
Why this matters¶
Without ACL, a RAG system flattens every source's permission model into one shared index. A Confluence page restricted to your Finance team becomes readable by every DocBrain user. A private Slack channel discussion surfaces in a marketing intern's question. Sensitive GitHub repos leak through search results.
DocBrain solves this with source-system ACL mirroring: every chunk in the index carries the principals (users, groups, channels, roles) authorised to read it at the source. Every retrieval filters by the requesting user's resolved principal set.
Quick start¶
- Configure your IdP so SSO group claims reach DocBrain (see RBAC & SSO).
- Turn on ACL mirroring for the sources you care about. Default is off — fully backwards compatible.
acl:
mode: enforce # off | warn | enforce
sources:
confluence:
mode: mirror # extract real Confluence restrictions
github:
mode: mirror # extract repo visibility + collaborators
slack:
mode: mirror # extract channel membership
jira:
mode: mirror # extract issue security level
- Backfill existing chunks (one-time, for content that pre-dates the ACL mirroring deploy):
kubectl create job --from=cronjob/docbrain-ingest acl-backfill \
-n docbrain --overrides '{"spec":{"template":{"spec":{"containers":[{"name":"ingest","command":["docbrain-acl-backfill"],"env":[{"name":"ACL_BACKFILL_SOURCE","value":"confluence"}]}]}}}}'
- Optional: validate before enforcing. Set
mode: warnfirst. The system computes filters and logswould-have-deniedchunks without actually changing user-visible results. Operators run this for a few days to validate ACL coverage before flipping toenforce.
How it works¶
At ingest¶
Per-source AclProvider impls extract real permission data:
| Source | What it captures |
|---|---|
| Confluence | Page-level read restrictions; falls back to space permission scheme |
| GitHub | Repo visibility (public / internal / private) + collaborators + teams |
| Slack | Channel membership for private channels; workspace group for public |
| Jira | Issue security level + project-level Browse permission |
Each provider returns a DocumentAcl with the principal set. The
ingest pipeline persists it to Postgres (document_acl table) and
writes the canonical principal strings into the OpenSearch chunk
index (acl_principals keyword array).
At query¶
When a user asks a question:
- User identity → principal set. Their DocBrain user, email, linked external identities (GitHub login, Slack user ID), and SSO group memberships all become principals.
- Filter every retriever. Each retriever's hits are intersected with the user's principals. Chunks the user can't read are dropped before they reach the reranker, the LLM, or the response.
- Side-channel mitigations. When the filter wipes out the result set, the side-channel guard fires:
- Answer text is replaced with the configured denial message
- Confidence is forced to 0.0
- Sources are emptied
- Episodic memory persistence is skipped (no future leak via past-question lookup)
Modes¶
| Mode | Behaviour |
|---|---|
| off (default) | ACL filter not applied. Backwards-compatible. |
| warn | Filter computed but not applied. Logs would-have-denied chunk_ids on every query. Use to validate coverage before enforcing. |
| enforce | Filter applied. Denied chunks dropped. Fail-closed on chunks lacking ACL data (configurable). |
Per-source policy¶
Operators control how each source's ACL is treated independently:
acl:
sources:
confluence:
mode: mirror
space_overrides:
ENG: public # treat Engineering as broadly shareable
HR: admin_only # restrict HR to admins only
slack:
mode: mirror
github:
mode: mirror
jira:
mode: mirror
| Per-source mode | Meaning |
|---|---|
off |
Don't extract ACL. Chunks have no ACL data; query filter treats per unknown_policy. |
mirror |
Extract real per-document ACL from the source (the default for production deployments). |
public |
Tag with the Public sentinel — any authenticated DocBrain user can read. |
admin_only |
Tag with the admin role principal — only admins can read. |
Denial UX (RFC-002)¶
When the filter denies content, the system communicates the denial in one of three modes — operator-configurable per deployment:
| Mode | What the user sees | When to use |
|---|---|---|
| silent | Generic "I don't have information"; no metadata leak | MNPI / SEC-regulated content; existence of restricted material is itself non-public |
| disclosed_no_count (default) | "This answer reflects only content you have access to. Some related material may exist but is restricted by your team's permissions. Contact your administrator..." | Standard enterprise — users know the system is filtering, no specifics leak |
| disclosed | Includes denied count + referral string verbatim | Open-collaboration orgs prioritising fast self-service unblocking |
Per-source override (strictest wins)¶
Different sources can have different denial policies. When a query touches multiple sources with different settings, the strictest mode wins to prevent leak by inclusion:
acl:
denial:
mode: disclosed_no_count
source_overrides:
confluence:
space_overrides:
FINANCE: { mode: silent } # Finance space → always silent
MNPI: { mode: silent }
slack:
channel_overrides:
incidents-finance: { mode: silent }
A query that returns chunks from both ENG (disclosed_no_count) and
FINANCE (silent) → the response uses silent mode. Mixing public
and MNPI content never weakens the response.
Per-role override¶
Admins always get full disclosure by default so they can debug:
acl:
denial:
role_overrides:
admin: disclosed
analyst: disclosed_no_count # explicit override (default: inherit)
Audit log¶
For HIPAA / FedRAMP / SOC2 deployments, every full or partial denial can be persisted to a durable audit log:
Each row records:
- decided_at — when the denial fired
- user_id — the DocBrain user (NULL on user delete via CASCADE SET NULL)
- query_hash — SHA256 of the query (or raw text if audit_raw_query: true)
- decision — full_deny or partial_deny
- denial_mode — the resolved mode the response used
- denied_breakdown — [{source_type, namespace, count}, ...]
- denied_count — total across all sources
- policy_chain — provenance for the resolved decision
Queries are SHA256-hashed by default — queries can themselves be
MNPI ("what's the Q3 earnings number?"). Operators flip
audit_raw_query: true only with explicit compliance approval.
Response shape¶
Every /api/v1/ask response includes a structured access field
when the filter fires:
{
"answer": "...",
"sources": [...],
"confidence": 0.85,
"access": {
"mode": "disclosed_no_count",
"filter_applied": true,
"fully_denied": false,
"denied_count": 0,
"referral": "your administrator"
}
}
| Field | Meaning |
|---|---|
mode |
Resolved denial mode for this response |
filter_applied |
True iff at least one chunk was denied |
fully_denied |
True iff the filter emptied the result set entirely |
denied_count |
Number of denied chunks. Zeroed in silent and disclosed_no_count modes. |
referral |
Operator-configured referral string |
The access field is absent when the filter doesn't fire (no
denials happened) — keeps responses small for the common case.
Diagnostics¶
"What can I see?"¶
Authenticated users can hit GET /api/v1/me/acl to see their
resolved principal set:
{
"user_id": "...",
"email": "...",
"principals": [
{"canonical": "public:system:public", "kind": "public", "origin": "synthetic"},
{"canonical": "user:sso:alice@example.com", "kind": "user", "origin": "email"},
{"canonical": "sso_group:sso:engineering", "kind": "sso_group", "origin": "oidc_claim"},
{"canonical": "user:slack:U123ABC", "kind": "user", "origin": "external_identity:slack"}
]
}
Useful for users to debug "why didn't I get this answer?"
Coverage report¶
Admins can check what fraction of the corpus has ACL data via
SQL on the document_acl table:
SELECT
(SELECT COUNT(*) FROM documents WHERE deleted_at IS NULL) AS total,
(SELECT COUNT(DISTINCT document_id) FROM document_acl da
JOIN documents d ON d.id = da.document_id WHERE d.deleted_at IS NULL) AS with_acl;
Threat model summary¶
| Threat | Mitigation |
|---|---|
| Cross-tenant chunk leak | Filter applied at every retriever output; defense-in-depth post-filter on by-id retrievers |
| Stale ACL leakage | max_age_at_query hard fence + periodic refresh + webhook subscriptions |
| Existence side-channel | Confidence forced to 0 on empty filtered result; episode/cache keyed on principal hash |
| Cross-user cache poisoning | Cache key includes principal set hash |
| Episode/feedback leakage | Persistence skipped when filter fully denies |
| Bootstrap-admin bypass | Configurable, defaults to [Public, World] only — bootstrap keys do NOT bypass ACL |
| Connector ACL bug | acl_capable() = false ⇒ admin-only ingest fallback |
| Mixing MNPI + public in one response | "Strictest wins" denial-mode resolver |
Reference¶
- Trait:
AclProviderindocbrain-ingest/src/acl_provider.rs - Postgres tables:
principals,document_acl,user_principals,user_external_identities,acl_audit_log - OpenSearch field:
acl_principals(keyword array on each chunk) - Diagnostics endpoint:
GET /api/v1/me/acl - Backfill binary:
docbrain-acl-backfill
Configuration reference¶
See Configuration for the full env-var reference.