Configuration Reference¶

How Configuration Works¶

DocBrain uses a config-first architecture with a layered YAML + environment variable system. Understanding this prevents confusion about why a value isn't taking effect.

Loading Order (later = higher priority)¶

config/default.yaml         ← committed to repo — all non-secret defaults
config/{APP_ENV}.yaml       ← environment-specific overrides (development | production)
config/local.yaml           ← gitignored — your secrets and local overrides
Environment variables / .env ← always win — highest priority

Set APP_ENV=production for the production profile (this is the default in the Docker image). The server defaults to APP_ENV=development when running locally without Docker.

What Goes Where¶

Type	Where to put it
Infrastructure secrets (DB URL, LLM API keys, Redis, OpenSearch)	`.env` or environment variables
Ingest source credentials (Confluence token, GitHub token, Slack token, Jira token)	`config/local.yaml` (gitignored)
Deployment-specific values (URLs, ports, CORS origins)	`.env` or environment variables
Tuning (thresholds, intervals, cache TTLs)	`config/local.yaml` or env vars
Team-wide defaults you want committed	`config/default.yaml` (no secrets!)

The key distinction: .env is for infrastructure secrets that the runtime environment must inject (container orchestration, CI/CD, secrets managers). config/local.yaml is for user-managed source credentials and personal overrides — it's gitignored so it never gets committed, but it lives alongside the project where you can edit it easily.

Example `config/local.yaml`¶

# config/local.yaml — never committed (gitignored)
# Configure ingest sources and personal overrides here.

confluence:
  base_url: https://acme.atlassian.net/wiki
  user_email: you@acme.com
  api_token: ATATT3x...
  space_keys: DOCS,ENG

sources:
  github:
    token: ghp_...
    pull_requests:
      repos:
        - acme/platform
        - acme/docs
      lookback_days: 180
  jira:
    base_url: https://acme.atlassian.net
    user_email: you@acme.com
    api_token: ATATT3x...
    projects:
      - ENG
      - PLAT

# Local tuning overrides (optional)
autopilot:
  enabled: true
  cluster_threshold: 0.78

rag:
  cache_ttl_hours: 1

YAML Config Structure¶

Every YAML value supports ${ENV_VAR} and ${ENV_VAR:-default} substitution:

database:
  url: "${DATABASE_URL}"     # required — must come from env
  max_connections: "${DB_MAX_CONNECTIONS:-10}"

Custom Config Directory¶

# Mount a ConfigMap in Kubernetes
DOCBRAIN_CONFIG_DIR=/etc/docbrain docbrain-server

# Or pass as CLI argument
docbrain-server --config-dir /etc/docbrain

All configuration is also available via environment variables, set in .env for Docker Compose or via ConfigMap/Secret for Kubernetes. Environment variables always override YAML values.

Infrastructure¶

Variable	Default	Description
`DATABASE_URL`	—	PostgreSQL connection string
`OPENSEARCH_URL`	`http://localhost:9200`	OpenSearch endpoint
`REDIS_URL`	`redis://localhost:6379`	Redis connection string
`SERVER_PORT`	`3000`	API server listen port
`SERVER_BIND`	`0.0.0.0`	API server bind address
`LOG_LEVEL`	`info`	Log verbosity: `trace`, `debug`, `info`, `warn`, `error`
`DB_MAX_CONNECTIONS`	`10`	Maximum PostgreSQL connection pool size
`DB_CONNECT_TIMEOUT_SECS`	`10`	Timeout (seconds) for initial PostgreSQL connection
`DB_ACQUIRE_TIMEOUT_SECS`	`10`	Timeout (seconds) to acquire a connection from the pool
`DB_IDLE_TIMEOUT_SECS`	`300`	Idle connection lifetime (seconds) before cleanup

LLM Provider¶

Variable	Default	Description
`LLM_PROVIDER`	`bedrock`	Provider: `bedrock`, `anthropic`, `openai`, `ollama`, `groq`, `openrouter`, `together`, `deepseek`, `mistral`, `xai`, `gemini`, `azure_openai`, `vertex_ai`, `cohere`
`LLM_MODEL_ID`	varies	Model identifier (provider-specific)
`FAST_MODEL_ID`	—	Fast/cheap model for background side-calls: intent classification, query rewriting, entity extraction. Falls back to `LLM_MODEL_ID` if not set. Recommended: Haiku (Bedrock/Anthropic), `gpt-4o-mini` (OpenAI), `qwen2.5:7b` (Ollama). Alias: `HAIKU_MODEL_ID` (deprecated).
`INGEST_LLM_MODEL_ID`	—	Model used during ingest only for image extraction. Falls back to `LLM_MODEL_ID` if not set. Set this to a cheaper model — image extraction fires for every page with images. Using Opus 4 with `LLM_THINKING_BUDGET` without this override will cause throttling errors during ingest.
`DRAFT_MODEL_ID`	—	Model used for autopilot draft generation (two-phase reasoning + writing). Falls back to `LLM_MODEL_ID` if not set. Use a high-capability model here — drafts benefit from stronger reasoning.
`DRAFT_LLM_PROVIDER`	—	Provider for draft generation. Falls back to `LLM_PROVIDER` if not set. Allows cross-provider drafting — e.g. use Gemini Flash for Q&A but Anthropic Claude for drafts.
`LLM_THINKING_BUDGET`	—	Extended thinking token budget (tokens). Unset or `0` = disabled. Only applies to the primary `LLM_MODEL_ID`, never to `FAST_MODEL_ID` or `INGEST_LLM_MODEL_ID`.
`ANTHROPIC_API_KEY`	—	API key (if `LLM_PROVIDER=anthropic`)
`OPENAI_API_KEY`	—	API key (if `LLM_PROVIDER=openai`)
`OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama server URL
`OLLAMA_TIMEOUT_SECS`	`120`	HTTP timeout in seconds for Ollama requests. Increase for large/slow models (e.g. 70B) to avoid "error decoding response body" when the model takes longer than 2 minutes. Example: `300` or `600`. Allowed range: 60–900.
`OLLAMA_TLS_VERIFY`	`false`	Set to `true` to enforce TLS certificate validation for Ollama
`OLLAMA_VISION_ENABLED`	`true`	Set to `false` if your Ollama model doesn't support vision (skips image calls)
`AWS_REGION`	—	AWS region for Bedrock (e.g. `us-east-1`)
`AWS_ACCESS_KEY_ID`	—	AWS access key (optional — see credential chain below)
`AWS_SECRET_ACCESS_KEY`	—	AWS secret key (optional — see credential chain below)
`GROQ_API_KEY`	—	API key (if `LLM_PROVIDER=groq`)
`OPENROUTER_API_KEY`	—	API key (if `LLM_PROVIDER=openrouter`)
`TOGETHER_API_KEY`	—	API key (if `LLM_PROVIDER=together`)
`DEEPSEEK_API_KEY`	—	API key (if `LLM_PROVIDER=deepseek`)
`MISTRAL_API_KEY`	—	API key (if `LLM_PROVIDER=mistral`)
`XAI_API_KEY`	—	API key (if `LLM_PROVIDER=xai`)
`GEMINI_API_KEY`	—	API key (if `LLM_PROVIDER=gemini`)
`AZURE_OPENAI_API_KEY`	—	API key (if `LLM_PROVIDER=azure_openai`)
`AZURE_OPENAI_ENDPOINT`	—	Resource endpoint (if `LLM_PROVIDER=azure_openai`). e.g. `https://my-resource.openai.azure.com`
`AZURE_OPENAI_API_VERSION`	`2024-02-01`	API version (if `LLM_PROVIDER=azure_openai`)
`VERTEX_PROJECT`	—	GCP project ID (if `LLM_PROVIDER=vertex_ai`). Required.
`VERTEX_REGION`	`us-central1`	GCP region (if `LLM_PROVIDER=vertex_ai`)
`COHERE_API_KEY`	—	API key (if `LLM_PROVIDER=cohere`)

AWS Credential Chain: Bedrock uses the AWS SDK default credential chain: env vars → ~/.aws/credentials → IRSA (EKS) → EC2 Instance Profile → ECS Task Role. In production, use IRSA or instance profiles — no keys in env. Set serviceAccount.create=true and serviceAccount.annotations.eks.amazonaws.com/role-arn in Helm. The IAM role needs bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream permissions. See providers.md for full setup details.

GCP Credential Chain: Vertex AI uses gcp_auth which resolves credentials in this order: GOOGLE_APPLICATION_CREDENTIALS (service account key file) → Application Default Credentials (gcloud auth application-default login) → GKE Workload Identity → GCE/Cloud Run metadata service. In production on GKE, use Workload Identity — no keys needed in the cluster. See providers.md for Workload Identity setup details.

Ollama: model selection and tuning¶

Only use models with strong instruction-following capabilities. DocBrain's RAG pipeline requires the LLM to stay strictly grounded in retrieved documents. Models that default to training data instead of provided context will produce fabricated answers. Recommended: command-r:35b (purpose-built for RAG). See providers.md for the full model comparison table.

Recommended config: LLM_MODEL_ID=command-r:35b and FAST_MODEL_ID=qwen2.5:7b. The fast model handles intent classification and query rewriting; only the final answer uses the primary model.
"Error decoding response body" after 2–3 minutes: The default HTTP timeout is 120 seconds. If the model takes longer to generate the full response, the connection is cut and you get a decode error. Set OLLAMA_TIMEOUT_SECS=300 (or 600) so the client waits long enough.

Embedding Provider¶

Set EMBED_PROVIDER to choose your embedding model. One of: openai, bedrock, ollama.

Variable	Default	Description
`EMBED_PROVIDER`	`bedrock`	Provider: `bedrock`, `openai`, `ollama`
`EMBED_MODEL_ID`	varies	Embedding model identifier (e.g. `text-embedding-3-small`, `cohere.embed-v4:0`)

Switching Embedding Models¶

When you change EMBED_PROVIDER or EMBED_MODEL_ID to a model with different vector dimensions (e.g. Bedrock Cohere/1024 → Ollama nomic-embed-text/768), the server will refuse to start with a clear error:

Embedding dimension mismatch on index 'docbrain-chunks': existing=1024, required=768.

To migrate:

Set FORCE_REINDEX=true in your environment
Restart the server and run ingest — the old indexes are deleted and recreated
Remove FORCE_REINDEX after the migration completes

Variable	Default	Description
`FORCE_REINDEX`	`false`	Delete and recreate OpenSearch indexes when embedding dimensions change. Set once during migration, then remove.

Retrieval Pipeline¶

DocBrain runs queries through a five-stage retrieval pipeline when a reranker is configured:

Query understanding — rewrites + entity → space mapping
Candidate generation — parallel retrievers (BM25, vector, entity-exact, freshness, procedural, semantic) fused with Reciprocal Rank Fusion (RRF)
Semantic reranking — a cross-encoder (e.g. Cohere Rerank on Bedrock) scores every (query, candidate) pair on a calibrated [0.0, 1.0] scale
Diversity + coverage — per-source and per-document caps so one dominant source can't crowd out the LLM's context window
Grounding floor — chunks below a configurable relevance floor are dropped before the LLM sees them, preventing confident hallucination on noise

Why it matters¶

Without a reranker, BM25 scoring systematically buries small specialised sources under corpus-dominant ones: a single captured PR with 11 chunks is structurally out-ranked by a 4000-page Confluence space that happens to mention the same keywords. The cross-encoder reranker scores each (query, chunk) pair directly, independent of corpus size, so a precise answer in a small source can outrank a tangentially relevant chunk in a huge one.

The pipeline is opt-in. Set rerank.provider = "none" (the default) and DocBrain runs the legacy single-hybrid-search path with byte-identical behaviour to before the feature existed. Set it to any configured provider to activate the five-stage pipeline. Rollback is a single env var flip — no code change, no rebuild, no data migration.

Reranker (`rerank.*`)¶

Stage 3 of retrieval rescores the candidate pool with a cross-encoder, producing calibrated [0, 1] scores that drive the grounding floors. DocBrain supports every major hosted rerank API through a single dialect-driven HTTP client — adding a new provider is typically a config change, not a code change.

Built-in providers: bedrock, cohere, voyage, jina, mixedbread, pinecone, ollama. Plus custom for any other Cohere-family API without a rebuild.

# config/local.yaml — any hosted provider, one env var away
rerank:
  provider: cohere                    # or: bedrock | voyage | jina | mixedbread | pinecone | ollama | custom
  # model_id: rerank-v3.5             # provider default applies when unset
  top_n: 200                          # candidates scored per query
  batch_size: 100                     # docs per reranker call
  timeout_secs: 10                    # per-call timeout

Key	Env var	Default	Description
`rerank.provider`	`RAG_RERANK_PROVIDER`	`none`	`none` \| `bedrock` \| `cohere` \| `voyage` \| `jina` \| `mixedbread` \| `pinecone` \| `ollama` \| `custom`
`rerank.model_id`	`RAG_RERANK_MODEL_ID`	varies	Provider-specific model. Built-in defaults: Bedrock `cohere.rerank-v3-5:0`, Cohere `rerank-v3.5`, Voyage `rerank-2`, Jina `jina-reranker-v2-base-multilingual`, Mixedbread `mxbai-rerank-large-v1`, Pinecone `bge-reranker-v2-m3`, Ollama `nomic-embed-text`.
`rerank.top_n`	`RAG_RERANK_TOP_N`	`200`	How many candidates the reranker scores per query. Should match `rag.candidate_pool_size`.
`rerank.batch_size`	`RAG_RERANK_BATCH_SIZE`	`100`	Docs per reranker API call. Larger pools split into multiple batches. Clamped to `[1, 1000]`.
`rerank.timeout_secs`	`RAG_RERANK_TIMEOUT_SECS`	`10`	Per-request timeout. Tight because the reranker sits on the hot path of every `/api/v1/ask` request. On failure the pipeline falls back to RRF-only ranking.
`rerank.cohere_api_key`	`COHERE_RERANK_API_KEY`	—	Required when `provider = "cohere"`.
`rerank.voyage_api_key`	`VOYAGE_API_KEY`	—	Required when `provider = "voyage"`.
`rerank.jina_api_key`	`JINA_API_KEY`	—	Required when `provider = "jina"`.
`rerank.mixedbread_api_key`	`MIXEDBREAD_API_KEY`	—	Required when `provider = "mixedbread"`.
`rerank.pinecone_api_key`	`PINECONE_API_KEY`	—	Required when `provider = "pinecone"`. Uses `Api-Key` header, not Bearer.
`rerank.ollama_base_url`	`RAG_RERANK_OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama endpoint for local reranking. Ollama is a bi-encoder approximation — see notes below.

Custom provider — plug-and-play for any rerank API¶

Set provider = "custom" and fill the fields below to wire a new rerank API without rebuilding DocBrain. Defaults match Cohere's request/response shape; override any JSON key that differs.

Key	Env var	Required	Default	Description
`rerank.custom_base_url`	`RAG_RERANK_CUSTOM_BASE_URL`	✅	—	Full POST URL, e.g. `https://rerank.mycorp.internal/v1/rerank`
`rerank.custom_api_key_env`	`RAG_RERANK_CUSTOM_API_KEY_ENV`	✅	—	Name of another env var that holds the API key (the key is never persisted in config.yaml)
`rerank.model_id`	`RAG_RERANK_MODEL_ID`	✅	—	Model id to send in the request body
`rerank.custom_auth_style`	`RAG_RERANK_CUSTOM_AUTH_STYLE`		`bearer_token`	`bearer_token` or `custom_header`
`rerank.custom_auth_header_name`	`RAG_RERANK_CUSTOM_AUTH_HEADER_NAME`	only with `custom_header`	—	Header name, e.g. `Api-Key`
`rerank.custom_documents_field`	`RAG_RERANK_CUSTOM_DOCUMENTS_FIELD`		`documents`	Request JSON key for the documents array
`rerank.custom_top_n_field`	`RAG_RERANK_CUSTOM_TOP_N_FIELD`		`top_n`	Request JSON key for the top-N limit
`rerank.custom_results_field`	`RAG_RERANK_CUSTOM_RESULTS_FIELD`		`results`	Response JSON key for the results array
`rerank.custom_score_field`	`RAG_RERANK_CUSTOM_SCORE_FIELD`		`relevance_score`	Response JSON key for the score

See rerank-providers.md for the provider matrix, per-provider quick-starts, and the "add a new provider in 2 minutes" walkthrough.

Ollama caveat: Ollama has no first-class rerank endpoint. DocBrain approximates rerank by cosine-similarity over query + document embeddings from any Ollama embedding model — a bi-encoder, not a cross-encoder. Quality is meaningfully lower than hosted providers; it exists for local development and air-gapped deployments. For true cross-encoder quality locally, run bge-reranker or mxbai-rerank behind a small HTTP wrapper and use provider: custom.

Fail-loud: a missing API key or an incomplete custom_* block fails at server startup with a message naming both the config field and its env var. There is no silent fallback to none.

Pipeline knobs (`rag.*`)¶

Every pipeline parameter is configurable — nothing is hardcoded. These defaults are the canonical-paper / standard-practice values; tune them only when you have query latency or quality data to justify a change.

rag:
  cache_threshold: 0.95                # existing cache knob
  cache_ttl_hours: 24                   # existing cache knob
  top_k: 10                             # final chunks sent to the LLM
  bm25_boost: 1.0                       # BM25 vs vector weight in hybrid

  # New knobs for the five-stage pipeline:
  candidate_pool_size: 200              # pool size fed to reranker
  rrf_k: 60                             # RRF damping constant
  max_per_source: 3                     # per-source cap in final top_k
  max_per_document: 2                   # per-document cap in final top_k
  # Grounding floors — calibrated for a cross-encoder reranker.
  # See "Grounding floors" below for what each one does and what
  # lowering them actually costs you.
  min_relevance_score: 0.40             # retrieval floor
  display_floor: 0.50                   # display floor (user-visible citations)
  confidence_gate: 0.40                 # confidence gate (show-sources threshold)
  strong_answer_floor: 0.55             # high-confidence answer threshold
  freshness_window_days: 7              # freshness retriever window
  freshness_source_types:               # which source types count as "fresh"
    - github_capture
    - gitlab_capture
    - slack_capture
    - ms_teams_capture
  entity_cache_ttl_secs: 300            # entity → space cache TTL
  max_rewrites: 2                       # query rewrites per ask
  fresh_only_phrases:                   # time-sensitive question phrases (live-only answers)
    - "on call"
    - "current rotation"
    - "incident commander"

  # Retrieval ladder (experimental, off by default). When enabled, an
  # answer is synthesised TWICE in parallel — once from indexed documents
  # only, once also incorporating live tool (MCP) data — and a fast LLM
  # "judge" picks the better answer. Low-confidence winners are augmented
  # with knowledge-graph expert routing ("these people may know more").
  retrieval_ladder:
    enabled: false                      # master switch (default off = legacy single-synth)
    graph_append_threshold: 0.5         # below this confidence, append graph experts
    judge_timeout_ms: 1500              # hard timeout for the judge LLM call
    # judge_model_id: null              # null = use the configured fast model

Key	Env var	Default	Description
`rag.candidate_pool_size`	`RAG_CANDIDATE_POOL_SIZE`	`200`	How many candidates the candidate generator produces for the reranker. Larger = better recall, more reranker cost.
`rag.rrf_k`	`RAG_RRF_K`	`60`	Reciprocal Rank Fusion damping constant. 60 is the canonical paper default. Larger = more democratic across retrievers; smaller = concentrates weight at top ranks.
`rag.max_per_source`	`RAG_MAX_PER_SOURCE`	`3`	Max chunks from any single source in the final top-k. Prevents a dominant source from monopolising the LLM context. Set to `top_k` to disable.
`rag.max_per_document`	`RAG_MAX_PER_DOCUMENT`	`2`	Max chunks from any single document in the final top-k. Prevents one long document from crowding out other relevant docs. Set to `top_k` to disable.
`rag.min_relevance_score`	`RAG_MIN_RELEVANCE_SCORE`	`0.40`	Retrieval floor — reranker score required to survive diversity selection and reach the LLM. Chunks below this are dropped before the LLM sees them, even if it means returning fewer than `top_k` results. Lowering sends weaker evidence into the prompt, which raises hallucination risk — the LLM will try to answer from chunks that only tangentially match. Raising forces more "insufficient information" answers. Set to `0.0` to disable (required when `rerank.provider = "none"`, because raw BM25/vector scores are not calibrated to [0,1]).
`rag.display_floor`	`RAG_DISPLAY_FLOOR`	`0.50`	Display floor — reranker score required for a chunk to appear in the `sources` array attached to the answer. Must be `>= min_relevance_score`. The LLM may still have used a chunk to form its answer even if it is hidden here. Lowering surfaces more citations per answer, but includes tangentially-related docs that erode user trust — the main cause of "why is this GitHub PR cited, it has nothing to do with my question?" complaints. Raising narrows the visible citation set to only high-confidence matches.
`rag.confidence_gate`	`RAG_CONFIDENCE_GATE`	`0.40`	Confidence gate — minimum composite confidence score required to show any sources at all. When confidence is below this, DocBrain emits the answer with a "based on general knowledge" framing and no citations, instead of citing weak evidence. Lowering shows sources on lower-confidence answers (useful when operators want to see what the retriever found, even when it wasn't enough). Raising forces the UI to go source-less more often, which is safer for end users but hides the retriever's partial matches from debugging.
`rag.strong_answer_floor`	`RAG_STRONG_ANSWER_FLOOR`	`0.55`	Strong-answer floor — top-1 reranker score required before the answer is emitted without a "low confidence" disclaimer. Below this threshold the answer carries a visible uncertainty warning; below `min_relevance_score` the query short-circuits to "insufficient information" without calling the LLM at all. Lowering removes the uncertainty warning from more answers (less noise in the UI, but users can't tell strong from borderline answers apart). Raising makes DocBrain more openly uncertain about marginal matches.
`rag.freshness_window_days`	`RAG_FRESHNESS_WINDOW_DAYS`	`7`	Days back for the freshness retriever. Recent chunks in this window get a guaranteed slot in the candidate pool regardless of raw BM25/vector rank. Set to `0` to disable.
`rag.freshness_source_types`	— (YAML only)	capture types	Which `source_type` values count for the freshness retriever. Default is the four capture types. Env vars can't represent lists — configure in YAML.
—	`RAG_FRESHNESS_PRE_DIVERSITY`	`false`	Deprecated — legacy multiplier path that scaled rerank scores by a per-doc freshness multiplier before the retrieval floor. The path conflates relevance with freshness: an old-but-relevant doc (e.g. a rarely-touched runbook) gets multiplied below the floor even when it's the top semantic match. Freshness is now display metadata only, surfaced in source cards rather than gating retrieval. Setting this to `true` re-enables the deprecated behaviour and is not recommended; the path will be removed in a future release.
—	`RAG_RERANK_TITLE_ENRICH`	`true`	Pass chunk title + heading + source/space to the reranker alongside the content body. Title is the single strongest relevance signal and used to be discarded. Set to `false` to send content only (legacy behavior).
`rag.entity_cache_ttl_secs`	`RAG_ENTITY_CACHE_TTL_SECS`	`300`	TTL for the entity → space resolution cache. New spaces added to the index become discoverable within this window.
`rag.max_rewrites`	`RAG_MAX_REWRITES`	`2`	Maximum alternate queries produced by query rewriting. Each rewrite costs one extra embed call + one extra hybrid search. `0` disables rewriting.
`rag.retrieval_ladder.enabled`	—	`false`	Experimental. Master switch for the retrieval ladder. When `false` (default), DocBrain uses the standard single-synthesis path. When `true`, an answer is synthesised twice in parallel (indexed-only vs. indexed+live-tool data) and an LLM judge picks the winner; low-confidence winners are augmented with knowledge-graph expert routing. Costs an extra synthesis + a judge call per answer, and disables token streaming (the final answer is delivered once the judge decides).
`rag.retrieval_ladder.graph_append_threshold`	—	`0.5`	When the winning answer's confidence is below this, append knowledge-graph "these people may know more" expert routing to the answer. Only applies when the ladder is enabled.
`rag.retrieval_ladder.judge_timeout_ms`	—	`1500`	Hard timeout for the judge LLM call. On timeout the ladder falls back to the higher self-graded confidence between the two answers.
`rag.retrieval_ladder.judge_model_id`	—	`null`	Model id for the judge call. `null` uses the configured fast model.
`rag.max_chunks_per_doc_in_retriever`	`RAG_MAX_CHUNKS_PER_DOC`	`2`	Chunk-flood fix. Max chunks per document that any single retriever may contribute to RRF. Before this knob, BM25 could return 100 chunks of one dominant document, crowding out the real answer. Cap at 2 preserves the top chunk as the RRF anchor plus one more for context. Dedup is per-retriever; different retrievers can still independently vote for the same doc. Set to a large number to effectively disable.
—	`RAG_COMPOUND_DECOMPOSE`	`true`	Compound query decomposition. Split questions like "what is X and how is X deployed" into distinct sub-intents, rerank each independently against the full candidate pool, and fuse results by taking the max rerank score per chunk across sub-intents. Fixes the class of question where no single chunk answers every intent, so the cross-encoder scores every chunk mediocrely against the compound query. Short questions (<8 words) skip decomposition entirely. Set to `false` to revert to single-query rerank.
—	`RAG_CONFIDENCE_RETRY_ENABLED`	`false`	Confidence-retry fallback. Master switch. When `true`, /ask responses with very-low confidence AND unused MCP tools in the user's eligible catalog are re-synthesized once with the picker in widen-mode (encouraging maximal tool selection). The retry's tool set is a strict superset of the first pass; the retry's answer always replaces the first-pass answer when the gate triggers. Default OFF — opt in per deployment. Doubles worst-case latency on the small fraction of queries that fall below threshold AND have unused tools. High-confidence answers, queries with all tools already dispatched, and queries that already exceeded the latency budget are never retried. Accepts `true \\| 1 \\| yes \\| on` (case-insensitive).
—	`RAG_CONFIDENCE_RETRY_THRESHOLD`	`0.25`	Confidence (strictly) below this triggers retry when the master switch is on. Bounded `0.0–1.0`; out-of-range values fall back to the default. Lower → fewer retries (only the very worst answers re-run). Higher → more retries (catches borderline answers but doubles latency on them).
—	`RAG_CONFIDENCE_RETRY_LATENCY_BUDGET_MS`	`12000`	Skip retry when the first pass already took this long. Bounded `1000–60000`; out-of-range values fall back to the default. Protects against pathologically slow queries getting hammered twice.
—	`RAG_AGENTIC_LOOP_ENABLED`	`false`	Agentic tool loop — master switch. Generalizes the confidence-retry above into a bounded multi-round tool loop: after each round of tool results, a pure stop-or-continue decision runs, bounded by per-surface round and wall-clock budgets. When `true`, this loop subsumes the confidence-retry — the loop runs instead of the single retry, and the `RAG_CONFIDENCE_RETRY_*` vars become the disabled-loop fallback. Default OFF — existing deployments are byte-identical until they opt in. Accepts `true \\| 1 \\| yes \\| on` (case-insensitive). Same env-validation contract as the confidence-retry: unset → silent default; set-but-invalid → `warn` log + default (a typo can never silently flip a deployment into an unexpected mode).
—	`RAG_AGENTIC_LOOP_MAX_ROUNDS_SLACK`	`5`	Hard cap on tool-dispatch rounds for the Slack surface. Slack posts an @mention when done, so the user isn't blocked synchronously — it tolerates more rounds. Bounded `1–10`; out-of-range values fall back to the default.
—	`RAG_AGENTIC_LOOP_MAX_ROUNDS_WEB`	`4`	Hard cap on tool-dispatch rounds for every non-Slack (web/api) surface. Synchronous HTTP — a client holds the connection open — so the cap is tighter than Slack. Sized to the canonical dependency-chain depth (a dead-source attempt, a search that surfaces a reference, the read that resolves it, then synthesis). Bounded `1–10`; out-of-range values fall back to the default.
—	`RAG_AGENTIC_LOOP_BUDGET_MS_SLACK`	`60000`	Overall wall-clock deadline (ms) for the Slack surface; the loop aborts and answers with partial results when exceeded. Bounded `1000–120000`; out-of-range values fall back to the default.
—	`RAG_AGENTIC_LOOP_BUDGET_MS_WEB`	`30000`	Overall wall-clock deadline (ms) for the web/api surface. Tighter than Slack because a human or client is holding a synchronous connection, but wide enough for a multi-step retrieval chain to complete. Bounded `1000–120000`; out-of-range values fall back to the default.
—	`RAG_AGENTIC_LOOP_CONFIDENCE_THRESHOLD`	`0.7`	Stop-when-confident bar: the loop continues while the best answer confidence is below this and rounds/budget remain, and stops once confidence reaches it (even with rounds left). Bounded `0.0–1.0`. Fallback: when unset, the loop reads the legacy `RAG_CONFIDENCE_RETRY_THRESHOLD` instead, so a deployment that already tuned the confidence-retry threshold keeps that exact value without a second knob; only if both are unset does it fall to `0.7`.
`rag.suppression.min_feedback_count`	`RAG_SUPPRESSION_MIN_FEEDBACK_COUNT`	`2`	Source-suppression learning loop — event gate. When a user marks a specific source within an answer as not-relevant (the per-source thumbs-down on web/CLI/Slack), DocBrain records the event and, once enough accumulates, down-ranks that document in retrieval for similar future questions. This is the minimum number of total not-relevant events on a document (across the episodes recalled for the live query) before it is suppressed. A document is suppressed when it crosses either this gate or the distinct-user gate, so a single click can never unilaterally bury a document. Set to `0` to disable this gate.
`rag.suppression.min_unique_users`	`RAG_SUPPRESSION_MIN_UNIQUE_USERS`	`2`	Source-suppression — distinct-user gate. Minimum number of distinct users who flagged a document not-relevant before it is suppressed. Anonymous (no user id) events count toward the event gate above but never toward this distinct-user quorum, so an anonymous click cannot manufacture a majority. Set to `0` to disable this gate.
`rag.suppression.rag_penalty_factor`	`RAG_SUPPRESSION_RAG_PENALTY_FACTOR`	`0.1`	Source-suppression — penalty strength. Multiplier applied to a suppressed document's retrieval score. Range `(0, 1]`: `1.0` means no penalty, smaller is a stronger down-rank. It is a down-rank, never a hard drop, so a suppressed document that is the only available evidence still surfaces (with an empty-answer floor guard) rather than producing an empty answer. A document that every match is suppressed is never resurrected above the relevance floor.
`rag.regen_loop_enabled`	`DOCBRAIN_REGEN_LOOP_ENABLED`	`true`	Regenerate-with-feedback loop — master kill-switch. The doc reviewer flags claims its evidence contradicts; this loop consumes those flags and regenerates the draft until they clear. The UI is human-in-loop (each "Regenerate" click is one round); the CLI/generate path auto-reviews. Reviewer feedback shapes the writing prompt only — it can never clear a flagged claim, because the reviewer re-derives flags from the evidence alone, so a "ship it anyway" instruction cannot override a contradicted claim. Disable only with `false \\| 0 \\| no \\| off`.
`rag.regen_loop_max_rounds`	`DOCBRAIN_REGEN_LOOP_MAX_ROUNDS`	`3`	Auto-review round cap (CLI / generate). Hard upper bound on automatic regenerate rounds on the non-interactive path. The loop also stops early once the flags stop decreasing (a plateau) and exits honestly with the unresolved flags attached, so it never runs forever and never silently ships flags. Range `1..=20`.
`rag.regen_loop_max_revisions`	`DOCBRAIN_REGEN_LOOP_MAX_REVISIONS`	`10`	Human-in-loop revision cap (UI). Maximum number of regenerations a reviewer can chain on a single draft from the UI. Each regeneration is preserved as a new revision linked to the original — the original is never overwritten — so the full feedback history stays auditable. Range `1..=100`.
`rag.regen_loop_min_budget_ms`	`DOCBRAIN_REGEN_LOOP_MIN_BUDGET_MS`	`20000`	Minimum wall-clock headroom to start a round. A regenerate round costs a gather + write + review pass; this guard refuses to start a round that cannot plausibly finish, so the loop degrades to an honest exit rather than a half-finished write. Set to `0` to disable the guard.
`rag.generate_hollow_min_sections`	`DOCBRAIN_GENERATE_HOLLOW_MIN_SECTIONS`	`3`	Hollow-document guard — section floor. Generation refuses (HTTP 422) a draft that is mostly unanswered `NEEDS INPUT:` placeholders, by placeholder density: the guard applies only when the draft has at least this many `##` sections. Below the floor (short / free-form docs) the rule is off. Set to `0` to disable the guard entirely.
`rag.generate_hollow_ratio_pct`	`DOCBRAIN_GENERATE_HOLLOW_RATIO_PCT`	`50`	Hollow-document guard — density threshold. Refuse when at least this percentage of the draft's `##` sections are `NEEDS INPUT:` placeholders (`50` = half the doc). A draft that honestly flags 1-2 of N sections ships untouched; the guard keys on the output shape, not on whether sources were supplied, so a generic corpus match that still yields a hollow doc is caught.
`rag.support_critic_enabled`	`DOCBRAIN_SUPPORT_CRITIC_ENABLED`	`true`	Master kill-switch for the support critic — the doc-generation GROUNDING reviewer. It extracts the draft's claims and flags the ones no supplied source supports (fabrication), plus `NEEDS INPUT:` markers the sources actually cover (the inverted self-flag). Distinct from the freshness critic (which flags only CONTRADICTED claims): the support critic flags UNSUPPORTED ones. Advisory — never blocks generation; findings are surfaced in the grounding report for the reviewer. ON by default. Disable only with `false`/`0`/`no`/`off`. Fail-open: any critic error ships the draft unchanged.
`rag.support_critic_max_claims`	`DOCBRAIN_SUPPORT_CRITIC_MAX_CLAIMS`	`40`	Hard cap on the number of claims grounding-checked per draft. Bounds the worst-case LLM cost of the support critic. Range `1..=200`.
`rag.merge_enabled`	`DOCBRAIN_MERGE_ENABLED`	`true`	Master kill-switch for the merged-doc update. When generating against an existing target document, the output is the full merged document — unchanged sections preserved byte-exact, changed sections rewritten, new sections surfaced — plus a per-section change manifest, so the result can replace the whole doc with confidence about what changed. Defaults ON (unchanged sections are byte-exact, an unsafe splice bails, nothing is ever published). Disable only with `false`/`0`/`no`/`off`.
`rag.merge_max_sections`	`DOCBRAIN_MERGE_MAX_SECTIONS`	`60`	Max existing sections fed to the merge decision in one pass. Sections beyond the cap are kept verbatim, so the cap limits cost, not correctness. Range `1..=500`.
`rag.merge_max_tokens`	`DOCBRAIN_MERGE_MAX_TOKENS`	`4096`	Max output tokens for the merge decision call (the model returns only the changed spans). Range `512..=32768`.

Confidence-retry fallback — when to enable¶

DocBrain's standard /ask path makes a single picker decision: the fast LLM looks at the question and the user's eligible MCP tool catalog and decides which subset to invoke. That works for the vast majority of queries — the picker correctly invokes the relevant 1-3 tools and the synthesis produces a high-confidence answer.

The failure mode the retry fallback targets: the picker invokes a subset that doesn't find the answer (or invokes nothing), the synthesis returns very-low confidence, and the user gets a weak "I don't have enough information" answer when one of the unused tools in their catalog would have surfaced the data. This is most common when:

The user's question is phrased indirectly enough that the picker conservatively chose only one of several plausible tools.
A tool's manifest description doesn't match the question's keywords well, even though the underlying data is there.
Multiple loosely-related tools each could contribute, and the picker chose a single one rather than the union.

Default OFF. Existing deployments are byte-identical until they opt in. To enable, set RAG_CONFIDENCE_RETRY_ENABLED=true in the server's env (helm: server.env.RAG_CONFIDENCE_RETRY_ENABLED: "true").

Gate logic (ALL must hold for the retry to trigger):

Env flag is on.
First-pass confidence is known and strictly below RAG_CONFIDENCE_RETRY_THRESHOLD.
First-pass dispatched fewer tools than the eligible catalog (room to widen).
First-pass elapsed wall-clock ≤ RAG_CONFIDENCE_RETRY_LATENCY_BUDGET_MS.

Any false → retry skipped → first-pass answer returned unchanged.

Observability. A triggered retry emits two structured log lines: rag::retry triggered — re-synthesizing with all tools (with the first-pass confidence, tool count, catalog size, elapsed_ms, and configured threshold) and rag::retry completed (with the retry's confidence, tool count, and a retry_helped boolean comparing first-vs-retry confidence). Operators tune the threshold by measuring the ratio of triggered retries to retry_helped=true results; if a deployment's retries rarely improve answers, the threshold is too high and the retry is wasting budget. If too few queries trigger retry but reviewers see weak answers, the threshold is too low.

Latency. When the gate triggers, the request makes a second picker call + a second synthesis call. Median latency for the retry is similar to the first pass; worst case approximately doubles. The latency budget gate (RAG_CONFIDENCE_RETRY_LATENCY_BUDGET_MS) protects against the pathological case where the first pass already burned the user-tolerable budget — those queries skip retry and return the first-pass answer unchanged.

Agentic tool loop — when to enable¶

The confidence-retry above answers a one-shot question: "the first pass looked weak — should we re-run with all tools forced on, exactly once?" The agentic tool loop generalizes that into a bounded multi-round loop. After each round of tool dispatch, a pure stop-or-continue decision runs against the round's results, bounded by a per-surface budget (round count + wall-clock). The "high confidence → stop" insight from the confidence-retry becomes a precedence branch here: a confident answer stops the loop even with rounds left.

One mechanism, not two. When RAG_AGENTIC_LOOP_ENABLED=true, the loop subsumes the confidence-retry — the loop runs instead of the single retry, so you never get both. When the loop is disabled (the default), the RAG_CONFIDENCE_RETRY_* path remains the active fallback exactly as documented above. This is why the loop honors RAG_CONFIDENCE_RETRY_THRESHOLD as the fallback when RAG_AGENTIC_LOOP_CONFIDENCE_THRESHOLD is unset: a deployment that already tuned the retry threshold carries that value into the loop without a second knob.

Per-surface budgets. The loop is tuned per delivery surface because the latency contract differs:

Surface	Max rounds	Wall-clock budget	Why
Slack	`5` (`RAG_AGENTIC_LOOP_MAX_ROUNDS_SLACK`)	`60000` ms (`RAG_AGENTIC_LOOP_BUDGET_MS_SLACK`)	Slack posts an @mention when done — the user isn't blocked on a synchronous response, so a longer loop is tolerable.
Web / API	`4` (`RAG_AGENTIC_LOOP_MAX_ROUNDS_WEB`)	`30000` ms (`RAG_AGENTIC_LOOP_BUDGET_MS_WEB`)	Synchronous HTTP — a human or client holds the connection open. Sized to the canonical retrieval dependency-chain depth while keeping responses bounded.

Default OFF. Existing deployments are byte-identical until they opt in. To enable, set RAG_AGENTIC_LOOP_ENABLED=true in the server's env (helm: server.env.RAG_AGENTIC_LOOP_ENABLED: "true").

Validation. Every var follows the same contract as the confidence-retry: an unset value silently falls back to its documented default; a value that is set but invalid (parse failure, out of range, NaN for the threshold) falls back to the default and emits a warn log, so a typo in a values file can never silently flip a deployment into an unexpected mode.

Grounding floors — what lowering actually costs¶

The four floor values above (min_relevance_score, display_floor, confidence_gate, strong_answer_floor) are the single biggest quality lever in DocBrain. They all gate on the reranker's calibrated [0, 1] score, which is the output of stage 3 of the retrieval pipeline. Their defaults are tuned for a real cross-encoder (Cohere Rerank v3.5, Voyage rerank-2, Jina reranker-v2, or equivalent).

The calibration insight. A well-tuned cross-encoder's [0, 1] scores are not a percentage and not a uniform distribution. In practice, for Cohere Rerank v3.5 and similar models:

Score band	What this chunk means for the query
`> 0.70`	Directly answers the question. Should be cited.
`0.50 – 0.70`	Strongly related, useful supporting evidence. Should be cited.
`0.40 – 0.50`	Shares topical overlap. Probably useful context, not a standalone answer.
`0.30 – 0.40`	Tangentially related. Shares some keywords. Usually noise.
`< 0.30`	Unrelated. Safe to drop.

The recommended defaults (0.40 / 0.50 / 0.40 / 0.55) draw the line at "shares topical overlap" for retrieval and "strongly related" for citation display. That's deliberately asymmetric — the LLM can see weaker evidence than the user sees, so it can reason about it, but we don't surface marginal chunks as if they were endorsed sources.

The recall-precision knob. Lowering any floor improves recall (more answers surfaced) and costs precision (more noise in what reaches the user). Raising any floor does the opposite. The four floors target different failure modes:

min_relevance_score is the strongest lever for hallucination control. Every chunk above this reaches the LLM. If you set it to 0.0, the LLM sees the entire candidate pool — including the tangentially-related 30% — and will sometimes write confident-sounding answers grounded in chunks that don't actually support the claim. If you see hallucinations on questions where the retriever did find the right doc, this floor is too low.
display_floor is the strongest lever for citation trust. Every chunk above this gets shown to the user as a "source". If you see "why is this GitHub PR cited, it has nothing to do with my question?" complaints, this floor is too low. Raising it from 0.30 to 0.50 typically eliminates 60–80% of noisy citations without meaningfully changing answer quality, because the LLM still has access to those chunks internally.
confidence_gate controls whether sources render at all. It gates on the composite answer confidence, not the top rerank score — that's why it's separate from strong_answer_floor. Use it to hide sources on weak answers without killing the answer itself.
strong_answer_floor is a UX knob, not a retrieval knob. It only affects whether the answer carries a "low confidence" disclaimer. Lower it if your users find the disclaimer noisy; raise it to make DocBrain more openly uncertain about borderline matches.

When rerank.provider = "none": these floors gate on raw BM25/vector scores, which are not calibrated to [0, 1]. A BM25 score of 0.40 means nothing comparable to a cross-encoder score of 0.40. Set all four floors to 0.0 in that mode and bound results with top_k instead. This is also what makes the plug-and-play rerank providers in rerank-providers.md so load-bearing — a real reranker is what makes these floors work at all.

How to debug a noisy citation. Run docbrain trace-query "your question" and look at the rerank log line in stage 3. Each cited chunk has its rerank score printed. If the noisy citation is scoring 0.30–0.45, it's a floor problem — raise display_floor and it goes away. If it's scoring > 0.50, the reranker actually thinks it's relevant and the issue is upstream (candidate pool, query decomposition, or title enrichment leaking metadata into the rerank input).

Observability¶

Every stage of the pipeline emits a structured log line so you can trace a single query's path through retrieval without attaching a debugger:

INFO stage="rag.staged.query_understanding" rewrites=2 sub_queries=2 entities=12 mapped_spaces=7
INFO stage="rag.staged.kg_doc_retriever" kg_entities=12 kg_doc_ids=47 hits=18
INFO stage="rag.staged.candidate_generation" retrievers=12 unique_chunks=348 pool_size=200
INFO stage="rag.staged.rrf_fusion" fused=200 rrf_k=60
INFO stage="rag.staged.rerank_sub_query" sub_query="what is payments-svc" top_score=0.82
INFO stage="rag.staged.rerank_sub_query" sub_query="how is payments-svc deployed" top_score=0.79
INFO stage="rag.staged.rerank" input_count=200 output_count=200 top_score=0.82 sub_queries=2 fusion="max_per_chunk"
INFO stage="rag.staged.freshness_pre_diversity" multipliers_fetched=264 reranked_count=200
INFO stage="rag.staged.diversity_select" candidates_in=200 selected=5 top_k=10 max_per_source=3 max_per_document=2 min_relevance_score=0.30
INFO stage="rag.staged.complete" final_count=5 elapsed_ms=7812

Stage meanings (in order):

query_understanding — classify intent, extract entities, build rewrites, decompose compound questions into sub-intents, resolve entities to spaces. sub_queries is the number of distinct sub-intents the decomposer produced (1 = no decomposition).
kg_doc_retriever — only fires when the knowledge graph has source_doc_ids edges for resolved entities. Pulls every chunk of those docs directly, bypassing BM25/vector.
candidate_generation — all retrievers finished. unique_chunks is total across the 6–12 retrievers after per-retriever chunk-flood dedup (see rag.max_chunks_per_doc_in_retriever).
rrf_fusion — Reciprocal Rank Fusion collapses the retriever outputs into one scored list.
rerank_sub_query — per-sub-query log line emitted in compound-query mode only. Shows the top score that each distinct sub-intent produced against the shared candidate pool.
rerank — cross-encoder scores every chunk against the query. top_score in [0, 1] is the calibrated highest-ranked hit. Title + heading + space are included in the rerank input when RAG_RERANK_TITLE_ENRICH=true (default). When sub_queries>1, carries fusion="max_per_chunk" indicating each chunk's final score is its best against any sub-intent.
freshness_pre_diversity — deprecated. Only fires when RAG_FRESHNESS_PRE_DIVERSITY=true (no longer the default). The legacy multiplier path scaled rerank scores by a per-doc freshness factor before the retrieval floor, which dropped old-but-relevant docs even when they were the top semantic match. Freshness is now display metadata, surfaced in source cards rather than gating retrieval.
diversity_select — enforces per-source + per-document caps and the retrieval floor. selected is the final top-k count.
complete — total wall clock, final_count sent to the LLM.

Set RAG_TRACE_DETAIL=true to additionally log every chunk in the final top-k with its reranker score, space, and document_id. Turn this on when diagnosing "why didn't chunk X surface?" — the logs will show whether it was dropped at retrieval, reranking, or diversity selection.

Admin trace endpoint — `?trace=true`¶

Phase 3 adds a structured pipeline trace that admin users can request per-query instead of grepping logs. POST /api/v1/ask with { "question": "...", "stream": false, "trace": true } and an admin API key. The response carries an extra pipeline_trace field:

{
  "answer": "...",
  "sources": [...],
  "confidence": 0.6,
  "pipeline_trace": {
    "query_id": "7c3a8f9b-...",
    "question": "how is payments-svc deployed in our env?",
    "retrievers_fired": ["literal", "rewrite_0", "entity_space_0", "kg_docs"],
    "pool_size": 200,
    "rerank_provider": "bedrock",
    "sub_queries": ["what is payments-svc", "how is payments-svc deployed in our env"],
    "stage_durations": {
      "query_understanding": 12,
      "kg_doc_retriever": 450,
      "candidate_generation": 1024,
      "rerank": 2870,
      "freshness_pre_diversity": 3,
      "diversity_select": 1,
      "total": 4360
    },
    "chunks": {
      "2217247499_2": {
        "chunk_id": "2217247499_2",
        "document_id": "2217247499",
        "title": "RFC - k8s deployments - A self-service approach of using helm charts",
        "space": "65673",
        "per_retriever_rank": [["kg_docs", 0], ["rewrite_0", 23]],
        "rrf_score": 0.234,
        "rerank_score": 0.72,
        "freshness_multiplier": 0.94,
        "post_freshness_score": 0.677,
        "passed_retrieval_floor": true,
        "passed_diversity": true,
        "final_rank": 0,
        "dropped_at": null
      }
    }
  }
}

Non-admin callers with trace: true get pipeline_trace: null (or no field, serde skip). No error — the existence of the feature is hidden from non-admins.

The admin CLI wraps this endpoint:

docbrain trace-query "how is payments-svc deployed?"

Renders the trace as a table: query info, retrievers fired, per-stage timings, final top-k chunks with titles and scores. Add --json to dump the raw trace JSON for scripting.

Use this whenever you need to answer "why didn't chunk X surface?" instead of SSH'ing into the pod and running log-grep pipelines. The per-stage dropped_at field on each chunk names the exact stage that killed it: rrf_not_in_pool, rerank_below_floor, diversity_source_cap, diversity_document_cap, diversity_top_k_filled, freshness_penalty.

Rolling back¶

If the staged pipeline ever causes a problem in production, roll back by setting RAG_RERANK_PROVIDER=none in the runtime environment and restarting the server. No code change, no rebuild, no data migration — the legacy single-hybrid-search path is byte-identical to before this feature shipped.

Document Ingestion¶

Configure sources in config/local.yaml (gitignored). Put only infrastructure secrets in .env.

General¶

Setting (`config/local.yaml` key)	Env var equivalent	Default	Description
`ingest.self_ingest`	`DOCBRAIN_SELF_INGEST`	`true`	Auto-ingest DocBrain's own docs
`ingest.image_extraction_enabled`	`IMAGE_EXTRACTION_ENABLED`	`true`	Extract and describe images using vision LLM

Source enablement is structural — a sub-source runs when its block is present under sources: in YAML. There is no separate list or enable flag.

Local Files¶

# config/local.yaml
sources:
  local:
    path: /data/docs

Key	Env var	Default	Description
`sources.local.path`	`LOCAL_DOCS_PATH`	—	Directory path for local file ingestion

Confluence¶

Set credentials in config/local.yaml:

confluence:
  base_url: https://yourco.atlassian.net/wiki
  user_email: you@yourco.com
  api_token: ATATT3x...
  space_keys: ENG,DOCS

Key	Env var	Default	Description
`confluence.base_url`	`CONFLUENCE_BASE_URL`	—	Atlassian instance URL (must include `/wiki`)
`confluence.user_email`	`CONFLUENCE_USER_EMAIL`	—	Auth email (not required for v1 Data Center)
`confluence.api_token`	`CONFLUENCE_API_TOKEN`	—	API token (Cloud) or Personal Access Token (Data Center)
`confluence.space_keys`	`CONFLUENCE_SPACE_KEYS`	—	Comma-separated space keys to ingest
`confluence.page_limit`	`CONFLUENCE_PAGE_LIMIT`	`0` (unlimited)	Max pages per space. `0` = all pages.
`confluence.api_version`	`CONFLUENCE_API_VERSION`	`v2`	`v2` for Cloud, `v1` for Data Center 7.x+
`confluence.tls_verify`	`CONFLUENCE_TLS_VERIFY`	`true`	Set to `false` for self-signed certs
`confluence.webhook_secret`	`CONFLUENCE_WEBHOOK_SECRET`	—	HMAC secret for real-time webhook sync (set as env var)

Ingestion sources — nested umbrella configuration¶

All ingestion sources now live under a single top-level sources: block. Each provider has one umbrella entry (github, gitlab, slack, jira, linear, …) with its credentials at the top and optional sub-sources nested inside. A sub-source is enabled when its block is present in YAML — there is no separate INGEST_SOURCES env var, and no per-source enable flag.

Resource lists are always explicit. Every list-of-targets field (repos, projects, channels, teams, …) must contain at least one entry. An empty list is a startup error — DocBrain never silently falls back to "ingest everything the token can see."

Selector grammar (GitHub & GitLab)¶

Repositories are specified with a small selector grammar:

Syntax	Meaning
`acme/platform`	Exact repository, use the repo's default branch
`acme/platform:develop`	Exact repository, pinned to the `develop` branch
`acme/*`	All repositories in the `acme` organisation (default branches)
`acme/infra-*`	All `acme` repositories whose name starts with `infra-`
`acme/*:main`	Rejected at startup — wildcards must use default branches

Wildcards: Parsing is supported today but runtime expansion against the GitHub/GitLab APIs is a follow-up and rejected at startup for now with a clear error. List repositories explicitly until wildcard resolution lands.

GitHub (code + pull requests)¶

# config/local.yaml
sources:
  github:
    token: ${GITHUB_TOKEN}                 # repo:read scope
    api_url: https://api.github.com         # override for GitHub Enterprise
    code:                                  # optional — ingest markdown from repos
      repos:
        - acme/platform
        - acme/docs:develop                 # pinned branch
    pull_requests:                         # optional — ingest PR discussions
      repos:
        - acme/platform
        - acme/backend
      lookback_days: 365
      min_comments: 1
      labels: []                            # empty = index all PRs

Key	Env var	Default	Description
`sources.github.token`	`GITHUB_TOKEN`	—	GitHub personal access token with `repo:read` scope
`sources.github.api_url`	`GITHUB_API_URL`	`https://api.github.com`	API host override for GitHub Enterprise
`sources.github.code.repos`	—	—	Required when `code` is set. Non-empty list of `owner/repo[:branch]` selectors
`sources.github.pull_requests.repos`	—	—	Required when `pull_requests` is set. Non-empty list of `owner/repo` selectors
`sources.github.pull_requests.lookback_days`	—	`365`	How far back to fetch merged PRs
`sources.github.pull_requests.min_comments`	—	`1`	Minimum total review/issue comments on a PR to be indexed
`sources.github.pull_requests.labels`	—	`[]`	Label filter — empty list indexes all PRs

GitLab (merge requests)¶

# config/local.yaml
sources:
  gitlab:
    token: ${GITLAB_TOKEN}                 # api scope
    base_url: https://gitlab.com            # override for self-hosted
    tls_verify: true                        # false for self-signed certs
    merge_requests:
      projects:
        - acme/platform
        - acme/infra
      lookback_days: 365
      min_notes: 1
      labels: []

Key	Env var	Default	Description
`sources.gitlab.token`	`GITLAB_TOKEN`	—	GitLab personal or project access token with `api` scope
`sources.gitlab.base_url`	`GITLAB_BASE_URL`	`https://gitlab.com`	Instance URL for self-hosted GitLab
`sources.gitlab.tls_verify`	`GITLAB_TLS_VERIFY`	`true`	Set to `false` for self-signed certs
`sources.gitlab.merge_requests.projects`	—	—	Required. Non-empty list of `group/project` paths
`sources.gitlab.merge_requests.lookback_days`	—	`365`	How far back to fetch merged MRs
`sources.gitlab.merge_requests.min_notes`	—	`1`	Minimum discussion notes on an MR to be indexed
`sources.gitlab.merge_requests.labels`	—	`[]`	Label filter — empty list indexes all MRs

Slack (threads)¶

# config/local.yaml
sources:
  slack:
    token: ${SLACK_INGEST_TOKEN}           # bot token: channels:history, channels:read, users:read
    threads:
      channels:                             # Slack channel names (not IDs)
        - "#incident-response"
        - "#eng-platform"
      min_replies: 3
      reactions:
        - white_check_mark
        - bookmark
      lookback_days: 90

Key	Env var	Default	Description
`sources.slack.token`	`SLACK_INGEST_TOKEN`	—	Bot token for ingestion (separate from `SLACK_BOT_TOKEN` used by @mentions)
`sources.slack.threads.channels`	—	—	Required. Non-empty list of channel names (leading `#` optional). The bot must be invited to every channel.
`sources.slack.threads.min_replies`	—	`3`	Minimum replies for a thread to be indexed
`sources.slack.threads.reactions`	—	`[white_check_mark, bookmark]`	Reactions that override the reply-count threshold
`sources.slack.threads.lookback_days`	—	`90`	How far back to scan for threads

Jira (issues)¶

# config/local.yaml
sources:
  jira:
    base_url: https://yourcompany.atlassian.net
    user_email: ${JIRA_USER_EMAIL}
    api_token: ${JIRA_API_TOKEN}
    projects:                               # required — no silent "all projects" fallback
      - ENG
      - PLAT
    # jql_filter: "resolution = Fixed"     # optional extra JQL clause
    lookback_days: 365
    issue_types:
      - Bug
      - Story
      - Task
      - Epic

Key	Env var	Default	Description
`sources.jira.base_url`	`JIRA_BASE_URL`	—	Jira instance URL
`sources.jira.user_email`	`JIRA_USER_EMAIL`	—	Service-account email for Basic auth
`sources.jira.api_token`	`JIRA_API_TOKEN`	—	Atlassian API token
`sources.jira.projects`	—	—	Required. Non-empty list of project keys (e.g. `ENG`, `PLAT`)
`sources.jira.jql_filter`	`JIRA_JQL_FILTER`	—	Additional JQL clause appended to the default query
`sources.jira.lookback_days`	`JIRA_LOOKBACK_DAYS`	`365`	How far back to fetch resolved issues
`sources.jira.issue_types`	—	`[Bug, Story, Task, Epic]`	Issue types to include

Linear (issues)¶

# config/local.yaml
sources:
  linear:
    api_key: ${LINEAR_API_KEY}
    teams:                                  # required — no silent "all teams" fallback
      - ENG
      - OPS
    lookback_days: 365
    states:
      - Done
      - Cancelled
      - Duplicate

Key	Env var	Default	Description
`sources.linear.api_key`	`LINEAR_API_KEY`	—	Linear personal API key
`sources.linear.teams`	—	—	Required. Non-empty list of team keys
`sources.linear.lookback_days`	`LINEAR_LOOKBACK_DAYS`	`365`	How far back to fetch completed/cancelled issues
`sources.linear.states`	—	`[Done, Cancelled, Duplicate]`	Issue states to include

Rate Limiting¶

DocBrain applies per-IP rate limiting to unauthenticated routes and per-API-key rate limiting to authenticated routes. Rate limiting is enabled by default.

Variable	Default	Description
`RATE_LIMIT_ENABLED`	`true`	Set to `false` to disable all rate limiting (not recommended for production)
`RATE_LIMIT_RPM`	`60`	Requests per minute per IP on unauthenticated routes
`RATE_LIMIT_AUTH_RPM`	`120`	Requests per minute per API key on authenticated routes
`RATE_LIMIT_WEBHOOK_RPM`	`30`	Requests per minute per IP on webhook endpoints (`/github/events`, `/gitlab/events`)

When a rate limit is exceeded, DocBrain returns 429 Too Many Requests with a Retry-After header.

GitLab MR Capture Webhook¶

The GitLab capture feature lets engineers trigger immediate ingestion by commenting @docbrain capture on any merge request.

Variable	Default	Description
`GITLAB_CAPTURE_WEBHOOK_SECRET`	—	HMAC secret shared with GitLab for webhook signature verification
`GITLAB_CAPTURE_TOKEN`	—	GitLab personal access token with `api` scope (fetches MR notes and posts reply comments)
`GITLAB_CAPTURE_BASE_URL`	`https://gitlab.com`	GitLab instance base URL (override for self-hosted)
`GITLAB_CAPTURE_ALLOWED_USERS`	—	Comma-separated GitLab usernames allowed to trigger capture. Empty = all users.
`GITLAB_CAPTURE_ALLOWED_PROJECTS`	—	Comma-separated project paths allowed to trigger capture. Empty = all projects. e.g. `myorg/myrepo`

See Ingestion Guide for full setup instructions.

GitHub Capture Security¶

These optional variables restrict which repos and users can trigger real-time GitHub PR/issue capture via @docbrain capture comments.

Variable	Default	Description
`GITHUB_CAPTURE_ALLOWED_REPOS`	—	Comma-separated `owner/repo` pairs allowed to trigger capture. Empty = all repos. e.g. `myorg/backend,myorg/frontend`
`GITHUB_CAPTURE_ALLOWED_USERS`	—	Comma-separated GitHub usernames allowed to trigger capture. Empty = all users. e.g. `alice,bob`

A 500KB content size guard applies to all capture requests. Oversized threads are rejected with a reply comment.

Confluence Webhooks (Real-Time Sync)¶

Variable	Default	Description
`CONFLUENCE_WEBHOOK_SECRET`	—	HMAC secret shared with Confluence. When set, DocBrain mounts `POST /confluence/events` and auto-ingests page changes in real time. Set as an environment variable (not in `config/local.yaml`).

When configured, DocBrain receives page_created, page_updated, page_restored, page_removed, and page_trashed events from Confluence and syncs changes automatically — no scheduled re-ingest needed.

Requires confluence.base_url and confluence.api_token to also be set in config/local.yaml (DocBrain needs API access to fetch the page content when a webhook fires).

See the Ingestion Guide for setup instructions.

Image Extraction¶

Variable	Default	Description
`IMAGE_EXTRACTION_ENABLED`	`true`	Extract and describe images from Confluence pages using vision LLM. Set to `false` to disable.
`INGEST_LLM_MODEL_ID`	—	Model used for image extraction during ingest. Falls back to `LLM_MODEL_ID` if not set. Set this to a cheaper model (Haiku, `gpt-4o-mini`) to avoid throttling and reduce cost.
`IMAGE_MAX_PER_PAGE`	`20`	Maximum images to process per Confluence page
`IMAGE_MIN_SIZE_BYTES`	`5120`	Skip images smaller than this in bytes (default: 5 KB) — filters out icons and decorative images
`IMAGE_MAX_SIZE_BYTES`	`10485760`	Skip images larger than this in bytes (default: 10 MB)
`IMAGE_DOWNLOAD_TIMEOUT`	`30`	HTTP download timeout in seconds per image
`IMAGE_LLM_TIMEOUT`	`120`	LLM vision call timeout in seconds (needs more time than download)

Image extraction requires a vision-capable LLM. Supported providers: Bedrock, Anthropic, OpenAI, and Ollama (with vision models like llava, llama3.2-vision, moondream). Text-only models (e.g. llama3.1) are auto-detected and images are skipped gracefully — no failures, no errors.

Web UI / CORS¶

Variable	Default	Description
`CORS_ALLOWED_ORIGINS`	`http://localhost:3001`	Comma-separated origins allowed to call the API. Only needed if the web UI is served from a non-default origin (e.g. `http://10.0.0.5:3001`, `https://docbrain.internal`)

Note: The default works out of the box for Docker Compose. You only need this if you access the web UI via a different hostname or port — for example, http://127.0.0.1:3001 is a different origin than http://localhost:3001.

Auth / Sessions¶

Variable	Default	Description
`LOGIN_SESSION_TTL_HOURS`	`720`	Session lifetime after email/password login (default: 720 hours = 30 days). Set to `0` for no expiry.
`IDLE_TIMEOUT_HOURS`	`0`	When set to a positive value, API keys whose `last_used_at` is older than this window are rejected as expired — defense against stolen-laptop / forgotten-kiosk scenarios where the absolute TTL is too generous. Default `0` = disabled (preserves existing behaviour). Recommended `24` for production deployments.
`IP_LOGIN_MAX_FAILURES`	`100`	Per-IP cap on public auth attempts within `IP_LOGIN_LOCKOUT_WINDOW_SECS`. Higher than the per-email cap (10) because corporate NATs share an IP across many users. Triggers `429 Too Many Requests` when exceeded.
`IP_LOGIN_LOCKOUT_WINDOW_SECS`	`600`	Sliding window in seconds for the per-IP attempt counter. 10 minutes by default.
`TRUSTED_PROXY_HOPS`	`0`	Number of trusted proxy hops in front of DocBrain. When `0` (default), `X-Forwarded-For` is ignored and the raw socket address is used for IP-based rate limiting — wrong for deployments behind a load balancer. Set to `1` when running behind a single ALB / nginx / Cloudflare hop so the per-IP cap keys on the real client IP, not the proxy IP. Without this, 100 failed auth attempts from any combination of users behind the proxy will trigger a shared 429 for everyone.
`MAX_QUERY_LENGTH`	`4000`	Maximum characters allowed for question and description inputs

MCP Tool Platform¶

Master switch for the live-tool orchestrator. When disabled (the default), the synthesis path is byte-identical to the pre-MCP path: no orchestrator round-trip, no fast-LLM dispatch, no measurable overhead. Flip to true once MCP_OAUTH_ENCRYPTION_KEY and MCP_MANIFEST_DIR are configured to enable live tool fan-out at answer time.

Variable	Default	Description
`MCP_TOOLS_ENABLED`	`false`	Master switch. `true` = orchestrator runs after retrieval, injects live-tool blocks into the synthesis prompt. Requires `MCP_OAUTH_ENCRYPTION_KEY` + `MCP_MANIFEST_DIR` to also be configured (else falls back to disabled).
`MCP_OAUTH_ENCRYPTION_KEY`	—	Base64-encoded 32-byte key for at-rest encryption of per-user OAuth tokens stored in the `mcp_oauth_tokens` table. Required when `MCP_TOOLS_ENABLED=true`.
`MCP_MANIFEST_DIR`	—	Directory containing MCP tool manifests (YAML). In the Helm chart this is mounted from the `docbrain-mcp-manifests` ConfigMap.
`DOCBRAIN_INTERNAL_MCP_SECRET`	—	Bearer secret for the in-process `/internal/mcp/*` shim routes (e.g. `jira-rest`). The server checks this header on every internal shim call. Set via Helm `mcpTools.internalShimSecret`.
`MCP_REGISTRY_PUBKEY`	—	Base64-encoded 32-byte Ed25519 public key used to verify the signed registry index and per-manifest signatures. When unset, `/api/v1/admin/mcp/registry*` and `/install-from-registry` return `503` and the server boots normally; admins can still install via the paste/URL endpoint. No default.
`MCP_REGISTRY_URL`	`https://registry.docbrain-ai.com/v1/index.json`	URL of the signed registry index.
`MCP_REGISTRY_CACHE_PATH`	`/var/lib/docbrain/registry-cache/index.json`	Disk path for the cached registry index. Acts as the Tier 2 fallback when the network fetch fails.
`DOCBRAIN_K8S_SECRET_NAME`	—	Kubernetes Secret name embedded in the kubectl command rendered by `/api/v1/admin/mcp/secrets/audit/{id}`. Optional — when unset the rendered command shows a `<set DOCBRAIN_K8S_SECRET_NAME>` placeholder.
`DOCBRAIN_K8S_NAMESPACE`	—	Kubernetes namespace for the same audit endpoint. Optional — placeholder when unset.
`DOCBRAIN_SERVER_PORT`	`3000`	Port the `docbrain-server` listens on. Used by manifests that interpolate `${DOCBRAIN_SERVER_PORT}` into the shim endpoint URL.
`DOCBRAIN_DM_PERSIST_POLICY`	`strict`	MCP tool-result DM redactor policy. When `strict` (default), tool-result entries identified as DMs (`is_im: true`, `is_mpim: true`, or channel.id starting with `D`) are stripped before they reach the synthesis prompt, episode cache, or memory consolidation. When `warn`, the redactor logs a warning per dispatch but passes DM content through (staging only). When `allow`, the redactor is disabled entirely — explicit foot-gun for operators who fork the Slack manifest and want DM content in their corpus. See `docs/security/slack-dm-policy.md` for the threat model.

YAML equivalent:

mcp_tools:
  enabled: false

Helm values¶

The chart exposes these under mcpTools.* in values.yaml:

Helm value	Maps to env	Notes
`mcpTools.enabled`	`MCP_TOOLS_ENABLED`	Master switch.
`mcpTools.encryptionKey`	`MCP_OAUTH_ENCRYPTION_KEY`	Required when enabled.
`mcpTools.internalShimSecret`	`DOCBRAIN_INTERNAL_MCP_SECRET`	Required when any `internal:` manifest is loaded.
`mcpTools.manifestDir`	`MCP_MANIFEST_DIR`	Defaults to the mounted ConfigMap path.
`mcpTools.serviceAccount.jira.apiToken`	—	Service-account fallback token used by the `jira-rest` shim.
`mcpTools.serviceAccount.jira.cloudId`	—	Atlassian cloud-id for the shim's REST base URL.
`mcpTools.oauth.atlassian.clientId`	—	OAuth client ID for per-user Atlassian token exchange.
`mcpTools.oauth.atlassian.clientSecret`	—	OAuth client secret.
`mcpTools.dmPersistPolicy`	`DOCBRAIN_DM_PERSIST_POLICY`	DM redactor policy: `strict` (default) / `warn` / `allow`. See `security/slack-dm-policy.md`.

Two reference manifests ship in the chart:

jira — Teamwork Graph / Atlassian Remote MCP. External; depends on Atlassian's hosted MCP server.
jira-rest — Internal shim served at /internal/mcp/jira-rest, backed by the Atlassian REST v3 API. Preferred path; more reliable than the hosted MCP.

Dynamic tool discovery¶

For MCP servers that publish a tools/list endpoint, DocBrain can auto-populate the tool catalog instead of requiring every tool to be hand-declared in the manifest. Add a tool_discovery block:

id: my_mcp
display_name: My MCP
# ... rest of manifest ...
tools: []                           # may be empty when discovery is dynamic
tool_discovery:
  mode: dynamic                     # default: static — explicit "dynamic" enables auto-discovery
  refresh_seconds: 3600             # poll interval; must be 0 (boot-only) or >= 60
  per_tool_defaults:
    output_size_cap_bytes: 16384    # <= 16384 ceiling
    latency_budget_ms: 7000         # <= 8000 orchestrator ceiling; shim honours this value per call

Read-only invariant (D1). DocBrain only registers tools where the upstream declares annotations.readOnlyHint == true. Tools without the hint, or marked false, are silently dropped at probe time. DocBrain does not dispatch write operations via MCP; this is a platform-wide invariant enforced at three gates: the probe-time filter, the required read_only field on every static tool, and a final assertion in eligibility_for_user.

Static tool field — read_only. Every entry in tools: MUST declare read_only: true (or false, which will then be blocked by the D1 gate at eligibility time). This is a required field; manifests missing it fail to parse.

Probe credentials.

Service-account or mixed auth: the manifest's service-account header is used for probes. No additional setup required.
OAuth-only auth: an admin must designate a probe user via PUT /api/v1/admin/mcp/manifests/{id}/probe-user. Until designated, the manifest stays in requires_probe_user status and serves no tools.

Static + dynamic name collisions. When a static tool and a discovered tool share a name:

If the static tool has override_discovered: true, the static entry wins and surfaces with tool_source: "static_override".
Otherwise BOTH entries are dropped from eligibility and the manifest's discovery status flips to degraded_collisions. Inspect via GET /api/v1/admin/mcp/manifests/{id}.

Boot behaviour. Dynamic manifests are excluded from eligibility until the first successful probe completes. Status surfaces in the admin detail endpoint as pending → ok (or failed / requires_probe_user).

Rootly on-call shim¶

The rootly manifest is served by an internal shim that exposes two read-only tools — rootly.get_oncall (who is on call now) and rootly.list_overrides (scheduled overrides). Unlike OAuth manifests, the shim authenticates to Rootly's REST API with an org-level token it reads directly from its own env (it is not routed through config/default.yaml). Set these as env vars (e.g. in the Kubernetes Secret via mcpTools.serviceAccount.rootly.* in Helm):

Variable	Default	Description
`ROOTLY_API_TOKEN`	—	Org-level Rootly API token. Required for the on-call shim; when unset the manifest is absent and on-call questions fall back to other sources. Read-only.
`ROOTLY_BASE_URL`	`https://api.rootly.com`	Rootly REST API base URL. Override only for self-hosted Rootly.

Slack Integration (Optional)¶

Variable	Default	Description
`SLACK_BOT_TOKEN`	—	Slack bot OAuth token (`xoxb-...`)
`SLACK_SIGNING_SECRET`	—	Slack app signing secret
`SLACK_GAP_NOTIFICATION_CHANNEL`	—	Channel to post critical gap alerts after each analysis run (e.g. `#docs-alerts`). Only fires when new critical-severity gaps are found. Requires `SLACK_BOT_TOKEN`.

Notifications (Optional)¶

Variable	Default	Description
`NOTIFICATION_INTERVAL_HOURS`	`24`	How often to check for stale docs and send owner DMs
`NOTIFICATION_SPACE_FILTER`	—	Comma-separated spaces to limit notifications (e.g. `PLATFORM,SRE`). Empty = all spaces.

Documentation Autopilot (Optional)¶

Variable	Default	Description
`AUTOPILOT_ENABLED`	`false`	Enable the Documentation Autopilot (gap detection + draft generation)
`AUTOPILOT_GAP_ANALYSIS_INTERVAL_HOURS`	`6`	How often the background scheduler runs gap analysis
`AUTOPILOT_LOOKBACK_DAYS`	`30`	Days of query history to analyse for gaps
`AUTOPILOT_CLUSTER_THRESHOLD`	`0.82`	Cosine similarity threshold for grouping queries into a gap cluster (0.65 = loose, 0.85 = strict)
`AUTOPILOT_MIN_CLUSTER_SIZE`	`3`	Minimum episodes in a cluster to be considered a real gap
`AUTOPILOT_MIN_UNIQUE_USERS`	`2`	Minimum distinct users that must hit the same gap topic
`AUTOPILOT_MIN_NEGATIVE_RATIO`	`0.15`	Minimum fraction of queries on a topic that must have negative feedback
`AUTOPILOT_MAX_CLUSTERS`	`50`	Maximum gap clusters to persist per analysis run
`AUTOPILOT_MAX_EPISODES`	`500`	Maximum negative episodes to load per analysis run
`AUTOPILOT_AUTO_DRAFT`	`false`	Automatically generate drafts for qualifying gaps (no human trigger). Set to `true` to enable.
`AUTOPILOT_AUTO_DRAFT_SEVERITY`	`critical`	Minimum gap severity for auto-drafting: `critical`, `high`, `medium`, or `low`
`AUTOPILOT_CRITICAL_USERS`	`5`	Unique users needed for breadth score to reach 1.0. Lower for small teams.
`AUTOPILOT_CRITICAL_SIGNALS`	`15`	Negative signals needed for volume score to reach 1.0. Lower for low-traffic deployments.
`AUTOPILOT_CRITICAL_THRESHOLD`	`0.75`	Composite score cutoff for "critical" severity.
`AUTOPILOT_HIGH_THRESHOLD`	`0.55`	Composite score cutoff for "high" severity.
`AUTOPILOT_MEDIUM_THRESHOLD`	`0.35`	Composite score cutoff for "medium" severity.
`AUTOPILOT_TARGET_MIN_SCORE`	`45.0`	Corpus-probe relevance floor: minimum OpenSearch hybrid (BM25+kNN, unbounded) probe score a candidate target doc must reach before autopilot auto-picks it to augment a `poor_coverage` gap. Below this the cluster is marked "needs human pick". Distinct from `VERIFY_CORPUS_MIN_SCORE`.
`GENERATED_DOCS_RETENTION_DAYS`	`90`	Retention window (days) for persisted ad-hoc `generate` runs shown in the web `/generate` History view (`generated_documents` table). Rows older than this are purged by a daily job, and the History list/detail also filter to this window. Bounds the data-at-rest exposure of the persisted document body, which is owner-scoped (you see your own runs; admins see all; machine-key runs are admin-only). Set `0` to keep indefinitely (both the purge and the read-window filter no-op).

When enabled, Autopilot runs on the configured schedule, exposes management endpoints at /api/v1/autopilot/*, and posts critical gap alerts to SLACK_GAP_NOTIFICATION_CHANNEL if configured. See the API Reference for endpoint details.

Small teams / dev environments: Set AUTOPILOT_CRITICAL_USERS=1, AUTOPILOT_CRITICAL_SIGNALS=3, AUTOPILOT_CRITICAL_THRESHOLD=0.3 to see critical gaps with minimal signal. See autopilot.md for a full tuning guide.

Draft Publishing¶

Controls where AI-generated drafts are published. Supports Confluence (default), GitHub (PR-based), and GitLab (MR-based). Use per-space routing via the Publish Targets API to override the default target for specific spaces.

Variable	Default	Description
`DRAFT_PUBLISH_TARGET`	`none`	Default publish target: `confluence`, `github`, `gitlab`, or `none`
`DRAFT_PUBLISH_AUTO_INGEST`	`true`	Re-ingest published docs so DocBrain learns from its own output

GitHub Publishing¶

Publish drafts as Pull Requests containing markdown files with YAML frontmatter. Requires a GitHub token with repo scope.

Variable	Default	Description
`GITHUB_PUBLISH_TOKEN`	—	GitHub personal access token with `repo` scope (secret)
`GITHUB_PUBLISH_REPO`	—	Target repository in `owner/repo` format (e.g. `acme/docs`)
`GITHUB_PUBLISH_BRANCH`	`main`	Base branch for PRs
`GITHUB_PUBLISH_DOCS_PATH`	`docs`	Directory in repo where doc files are placed
`GITHUB_PUBLISH_PR_LABELS`	`docbrain,auto-generated`	Comma-separated labels applied to PRs
`GITHUB_PUBLISH_CREATE_PR`	`true`	`true` = create a PR for review; `false` = commit directly to branch
`GITHUB_PUBLISH_API_URL`	`https://api.github.com`	Override for GitHub Enterprise Server

GitLab Publishing¶

Publish drafts as Merge Requests containing markdown files. Requires a GitLab token with api scope.

Variable	Default	Description
`GITLAB_PUBLISH_TOKEN`	—	GitLab personal access token with `api` scope (secret)
`GITLAB_PUBLISH_PROJECT_ID`	—	Numeric project ID (find in Settings → General)
`GITLAB_PUBLISH_BASE_URL`	`https://gitlab.com`	Override for self-hosted GitLab instances
`GITLAB_PUBLISH_BRANCH`	`main`	Base branch for MRs
`GITLAB_PUBLISH_DOCS_PATH`	`docs`	Directory in project where doc files are placed
`GITLAB_PUBLISH_MR_LABELS`	`docbrain,auto-generated`	Comma-separated labels applied to MRs
`GITLAB_PUBLISH_CREATE_MR`	`true`	`true` = create an MR for review; `false` = commit directly to branch

Per-Space Routing¶

Use the Publish Targets API (/api/v1/publish-targets) to route specific spaces to different targets. For example, keep Confluence as the default but publish the PLATFORM space to GitHub:

# Create a GitHub target for the PLATFORM space
curl -X POST /api/v1/publish-targets \
  -H "Authorization: Bearer db_sk_..." \
  -d '{"space": "PLATFORM", "target_type": "github", "config": {"token_env": "GITHUB_PUBLISH_TOKEN", "repo": "acme/platform-docs"}, "priority": 10}'

When publishing, DocBrain resolves the target in priority order: space-specific DB target → default config target → Confluence fallback. Config stored in the publish_targets table uses token_env (env var name) instead of raw secrets for security.

Freshness Scoring¶

Variable	Default	Description
`FRESHNESS_SCHEDULER_INTERVAL_HOURS`	`24`	How often freshness scores are recalculated for all documents
`CONTRADICTION_CHECKS_PER_PASS`	`10`	Max documents checked for contradictions per freshness run (LLM cost)
`CONTRADICTION_INCLUDE_RECENT_EVENT_DOCS`	`true`	Include recent Slack/PR/Jira docs in the contradiction pass alongside stalest docs
`CONTRADICTION_EVENT_DOC_MAX_AGE_DAYS`	`90`	Only event-based docs edited within this many days are eligible for contradiction checks
`FRESHNESS_LLM_CALLS_PER_PASS`	`50`	Max documents that get LLM content-currency analysis per scheduler tick. At 50/day, a 10k-doc corpus takes ~200 days to cover — raise as needed. Each call costs LLM tokens proportional to doc length.
`FRESHNESS_LINK_CHECKS_PER_PASS`	`20`	Max documents that get HTTP HEAD link-health checks per scheduler tick. Cheap compared to LLM — safe to raise for large corpora.
`FRESHNESS_ENGAGEMENT_V2_ENABLED`	`false`	Engagement signal v2. Master switch. When `true`, the per-doc engagement score uses Wilson lower bound on distinct-user-gated, recency-windowed votes (anti-brigade + bidirectional decay). When `false`, the legacy v1 path runs — `up / (up + down) * 100` over raw row counts with a `total >= 3 AND feedback_total >= 3` gate. Default OFF — opt in per deployment. Existing engagement_score values in `freshness_scores` are recomputed on the next scoring pass after the flag flips; no migration needed. Accepts `true \\| 1 \\| yes \\| on` (case-insensitive).
`FRESHNESS_ENGAGEMENT_MIN_RETRIEVERS`	`3`	v2 only. Minimum distinct users (NULL `user_id` excluded) who must have retrieved the doc within the recency window before any engagement signal is reported. Below this gate → `has_engagement_data = false` → doc stays in "Insufficient signals".
`FRESHNESS_ENGAGEMENT_MIN_VOTERS`	`3`	v2 only. Minimum distinct users who must have given thumbs-up or thumbs-down feedback within the window. Each user counts as at most one vote per doc (anti-brigade).
`FRESHNESS_ENGAGEMENT_WINDOW_DAYS`	`180`	v2 only. Days. Votes and retrievals older than this are ignored — this is what makes engagement decay automatically. A doc with no activity in this window has its engagement signal drop to neutral and returns to "Insufficient signals". Max `i32::MAX`; values above the cap fall back to default with a warn log.

Engagement v2 algorithm. The per-doc engagement score is the Wilson score lower bound at 95% confidence (z = 1.96) computed over the distinct-user up-vote ratio. Compared to the v1 raw ratio:

One user thumbs-up 10 times → counts as 1 vote (anti-brigade).
1 unanimous up-voter scores ~21, not 100 (false confidence at low n is suppressed).
100 unanimous up-voters scores ~96.
A user who later thumbs-down is treated by their NET vote — if the net sum is negative, counted as a down-voter; if zero, as a retriever-but-not-voter.

Bidirectional behaviour. When activity stops, old votes fall out of the recency window. The doc's has_engagement_data flips back to false on the next scoring pass and the doc returns to "Insufficient signals". This is the key difference from v1, where engagement was monotonically sticky.

Migration story. Flip-on is safe at any time: the legacy engagement_score column is recomputed in place by the next scheduled freshness pass (default 24h). Operators can flip back without rollback — the v1 code path is preserved verbatim and reused when FRESHNESS_ENGAGEMENT_V2_ENABLED=false.

Event-Based Source Types¶

Source types whose documents are permanent historical records — incident threads, merged PRs, support tickets — never go stale and shouldn't be evaluated for content currency or contradictions. The scorer pins their time_decay = 100 and skips LLM/link/contradiction passes.

This was a hardcoded list until v1.4; it's now configurable so operators can register custom permanent-record source types (e.g. a homegrown incident system) without rebuilding the image.

YAML key (under `freshness`)	Default	Description
`event_based_spaces`	`[slack_thread, github_pr, github, gitlab_mr, jira, linear, pagerduty, opsgenie, zendesk, intercom, fireflies]`	List of `documents.space` values treated as permanent historical records. Capture sources (`slack_capture`, `github_capture`, `gitlab_capture`) are intentionally NOT in the default — design discussions DO go stale.

Override in default.yaml (or via the helm value freshness.eventBasedSpaces) to add custom source types.

Excluding Documents from Freshness Reports¶

Documents that are intentionally frozen — archived project pages, retros, historical decision records, reference material — should not be evaluated for freshness. Old isn't the same as wrong. DocBrain detects these from source-system metadata at ingest and skips them in the scorer.

The Freshness page in the UI shows excluded counts via "View excluded (N)" in the page header. Excluded docs don't appear in the Total / Outdated / Stale / Review / Fresh rollups — they're not noise in the freshness view.

Quick recipe — exclude every doc tagged `retrospective` in Confluence¶

Helm-managed deployments (recommended — no image rebuild):

# values.yaml
freshness:
  exclusionRules:
    archived_labels:
      - archived          # defaults
      - historical
      - obsolete
      - deprecated
      - frozen
      - reference
      - retrospective     # ← your addition

helm upgrade <release> <chart> -f values.yaml

Then in the DocBrain UI: 1. Freshness → Reclassify lifecycle (or POST /api/v1/freshness/backfill-lifecycle) — re-derives every auto-managed doc against the new rules. Existing retrospective-tagged docs become archived in seconds. 2. Freshness → Rescore All — refreshes the rollup numbers.

Future docs with the tag get caught automatically at ingest. No further action needed.

Direct config edits (when not using helm): edit config/default.yaml, restart the server pod. Same rule.

Per-doc override (just one specific document, not the whole tag):

curl -X PATCH https://your.docbrain.example/api/v1/documents/{doc_id}/lifecycle \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"status": "archived"}'

Or use the row action menu in the UI: ⋯ → Mark archived. Manual overrides are sticky — they survive future syncs even if the source-system label changes back.

How detection works¶

During Confluence ingestion DocBrain reads each page's labels and (for Confluence Cloud) page status. The lifecycle classifier matches against three independent signal sources — any match marks the doc archived:

YAML key (under `freshness.exclusion_rules`)	Helm value	Default	What it matches
`archived_labels`	`freshness.exclusionRules.archived_labels`	`[archived, historical, obsolete, deprecated, frozen, reference]`	Source labels, case-insensitive. Confluence page labels match here.
`archived_page_statuses`	`freshness.exclusionRules.archived_page_statuses`	`[archived, trashed]`	Confluence Cloud `status` field.
`archived_title_patterns`	`freshness.exclusionRules.archived_title_patterns`	`['^Archived ', '^\[ARCHIVED\]', '$archived$$']`	Regex against doc title — safety net for un-labeled legacy docs.

These rules are list-shaped and configured in YAML only (env vars can't represent lists).

Which lifecycle status to use¶

The PATCH /lifecycle API and the row action menu accept four values. They all exclude the doc from scoring; pick the one that matches intent so your audit trail stays meaningful:

Status	Meaning
`active`	Default. Scored normally. Use this to un-archive a doc.
`archived`	Frozen historical record. Old by design.
`reference`	Evergreen content (style guides, glossaries). Don't nag, don't decay.
`deprecated`	Should eventually be deleted, but kept for now.

Reviewing what's been excluded¶

Click View excluded (N) in the Freshness page header. The modal groups docs by lifecycle status (archived / reference / deprecated), shows the source labels that triggered the classification, and exposes a Mark active button per row to un-archive a doc directly. Search filters by title, space, or tag.

Semantic Quality Scoring¶

LLM-based quality assessment that evaluates documents on four dimensions: accuracy, completeness, clarity, and actionability (each scored 0-25, total 0-100). Runs as a background sweep on documents that have already been structurally scored.

Variable	Default	Description
`SEMANTIC_QUALITY_ENABLED`	`true`	Enable LLM-based semantic quality scoring
`SEMANTIC_QUALITY_INTERVAL_HOURS`	`24`	How often the semantic scoring sweep runs
`SEMANTIC_QUALITY_BUDGET`	`50`	Maximum documents scored per sweep (controls LLM cost)
`SEMANTIC_QUALITY_STRUCTURAL_THRESHOLD`	`40.0`	Minimum structural score required before a document is eligible for semantic scoring

The composite quality score blends structural and semantic scores at 50/50 weighting. Documents below the structural threshold are skipped to avoid wasting LLM calls on obviously poor content.

Capture Lifecycle¶

Captured content (GitHub PRs/issues, GitLab MRs, Slack threads) decays with age — unlike incident records (Jira, PagerDuty, Zendesk) which are permanent historical events. A 5-year-old PR discussing a replaced architecture should score low in freshness; a 2-week-old incident thread is always valid.

Cross-document references: During capture, DocBrain automatically extracts URLs from the description and comments — GitHub PRs, GitLab MRs, Jira tickets, Confluence pages, and other linked resources. These are stored as a reference graph in PostgreSQL and used to enrich RAG context at query time by fetching chunks from referenced documents. GitLab shorthand references (!123 for MRs, #123 for issues) are resolved to full URLs within the same project.

Space assignment: Captures are stored under a meaningful space name derived from the source: - GitHub captures → owner/repo (e.g., myorg/backend) - GitLab captures → group/project (e.g., platform/api) - Slack captures → channel name (e.g., platform-incidents)

This makes allowed_spaces ACL filtering work correctly — a key scoped to ["myorg/backend"] will include GitHub captures from that repo.

Age baseline: Freshness is calculated from the original content creation date (when the PR was opened, when the Slack thread started) — not the time DocBrain captured it. Re-capturing the same thread updates its content but preserves the original creation date as the staleness baseline.

Memory Consolidation¶

Variable	Default	Description
`CONSOLIDATION_INTERVAL_HOURS`	`6`	How often the memory consolidation job runs (merges episodic patterns into semantic/procedural memory)

RAG Pipeline¶

Variable	Default	Description
`RAG_TOP_K`	`10`	Chunks retrieved per query. Higher = more context passed to the LLM, at the cost of more tokens per call. Raise to `15`–`20` if answers are missing obvious information; lower to `5` to reduce cost on simple corpora.
`RAG_BM25_BOOST`	`1.0`	Weight of keyword (BM25) search relative to vector search in hybrid retrieval. Raise to `2.0`–`3.0` for corpora heavy with exact-match queries — error codes, CLI commands, ticket IDs, specific tool names. Leave at `1.0` for general prose documentation.
`SEARCH_MIN_SCORE`	`0.0`	Drop retrieved chunks below this relevance score before sending context to the LLM. `0.0` keeps everything. Set to `0.3`–`0.4` if you notice irrelevant chunks contaminating answers; leave at `0.0` for small corpora where recall matters more than precision.
`RAG_CACHE_TTL_HOURS`	`24`	How long to cache semantically identical answers
`RAG_CACHE_THRESHOLD`	`0.95`	Cosine similarity threshold for a query to count as a cache hit

Chunking¶

Controls how documents are split before embedding. See Ingestion Guide for re-ingest instructions.

Variable	Default	Description
`CHUNK_SIZE`	`1500`	Target chunk size in characters. Dense API refs: `800`–`1200`. General docs: `1500`. Long-form prose: `2000`–`2500`.
`CHUNK_OVERLAP`	`200`	Overlap between adjacent paragraph-split chunks in characters.

OpenSearch Index Names¶

Variable	Default	Description
`OPENSEARCH_INDEX`	`docbrain-chunks`	Index name for document chunks (vectors + BM25)
`OPENSEARCH_EPISODE_INDEX`	`docbrain-episodes`	Index name for episode vectors (used in episodic memory recall)
`DOCBRAIN_MCP_OUTPUT_CEILING_BYTES`	`32768`	Hard upper bound the live-tool manifest validator enforces on any per-tool output cap. A per-tool cap above this is rejected at load.
`DOCBRAIN_MCP_DEFAULT_TOOL_OUTPUT_CAP_BYTES`	`32768`	Output cap inherited by tools discovered dynamically that ship no per-tool cap (e.g. chat search). Must be ≤ the ceiling above.
`DOCBRAIN_EVIDENCE_BUFFER_CAP_BYTES`	`65536`	Shared evidence-text budget across all tools in one tool-loop round — the real bottleneck. 2× the per-tool cap leaves headroom for other sources. These three caps form a chain: all must rise together, since the smallest truncates regardless of the others.
`DOCBRAIN_MCP_JQL_RECENCY_BOUND_DAYS`	`180`	Recency window (in days) the gateway appends to an unbounded full-text issue-tracker search. When a search query uses the full-text operator with no time window and no project/key clause, the gateway adds a recency bound so the upstream hits its date index instead of scanning the whole instance (which times out at the tool budget). Already-bounded queries are left untouched.

Only change these if you run multiple DocBrain instances sharing the same OpenSearch cluster, to avoid index collisions.

Data Retention¶

Variable	Default	Description
`EPISODE_RETENTION_DAYS`	`90`	Episode (query history) rows older than this are pruned daily. Set to `0` to disable pruning.
`AUDIT_RETENTION_DAYS`	`365`	Audit log rows older than this are pruned daily. Set to `0` to disable pruning.

Self-Ingest (Optional)¶

Variable	Default	Description
`DOCBRAIN_SELF_INGEST`	`true`	Auto-ingest DocBrain's own docs so it can answer configuration questions about itself
`DOCBRAIN_DOCS_PATH`	`./docs`	Path to DocBrain's own documentation directory

SSO / OIDC (Enterprise)¶

Variable	Default	Description
`OIDC_ISSUER_URL`	—	OIDC provider URL (e.g. `https://accounts.google.com`)
`OIDC_CLIENT_ID`	—	OAuth client ID
`OIDC_CLIENT_SECRET`	—	OAuth client secret
`OIDC_REDIRECT_URI`	—	Callback URI (e.g. `https://docbrain.example.com/api/v1/auth/oidc/callback`)
`OIDC_WEB_UI_URL`	`http://localhost:3001`	Where to redirect after successful login
`DOCBRAIN_WEB_BASE_URL`	—	Public origin of the DocBrain web UI. Drives the MCP-OAuth landing redirect AND the "view in browser" deep link the CLI prints after a `generate` (plus the shareable per-document link). Set to the user-facing web origin — not the API host if they differ. Unset → no link is offered (never guessed or hardcoded). Trailing slash trimmed.
`OIDC_ACCEPT_INVALID_CERTS`	`false`	Set to `true` to skip TLS verification — use for corporate/self-signed CAs

GitLab OIDC¶

Variable	Default	Description
`GITLAB_OIDC_ISSUER_URL`	—	GitLab instance URL (e.g. `https://gitlab.com` or `https://gitlab.corp.example.com`)
`GITLAB_CLIENT_ID`	—	GitLab OAuth application client ID
`GITLAB_CLIENT_SECRET`	—	GitLab OAuth application client secret
`GITLAB_REDIRECT_URI`	—	Callback URL (e.g. `https://docbrain.example.com/api/v1/auth/gitlab/callback`)

Corporate GitLab: If your self-hosted GitLab uses an internal CA, set OIDC_ACCEPT_INVALID_CERTS=true.

RBAC Role Assignment¶

Role is computed at login time and stored on the user record. The hierarchy is: viewer (1) < editor (2) < analyst (3) < admin (4). Higher-priority rules win.

Variable	Helm key	Description
`OIDC_DEFAULT_ROLE`	`rbac.defaultRole`	Role assigned to new SSO users who match no group rule. Default: `viewer`.
`OIDC_ADMIN_EMAILS`	`rbac.adminEmails`	Comma-separated emails that always receive `admin`.
`OIDC_ADMIN_DOMAIN`	`rbac.adminDomain`	Email domain whose users receive `admin` (e.g. `acme.com`).
`OIDC_ADMIN_GROUPS`	`rbac.adminGroups`	Comma-separated IdP group names → `admin` role.
`OIDC_EDITOR_GROUPS`	`rbac.editorGroups`	Comma-separated IdP group names → `editor` role.
`OIDC_ALLOWED_GROUPS`	`rbac.allowedGroups`	Access gate: only these groups may log in (all others get 403).
`OIDC_ALLOWED_DOMAINS`	`rbac.allowedDomains`	Access gate: only these email domains may log in.

What every engineer can see¶

All authenticated users (including viewer) have full access to the intelligence dashboards:

Page	What it shows
Velocity	Documentation ROI — queries deflected, hours saved, cost saved, per-team breakdown
Predictive	Predicted documentation gaps from code changes, cascade staleness, seasonal patterns, onboarding risks
Maintenance	AI-generated fix proposals with apply/reject workflow
Stream	Live knowledge event feed — incident warnings, freshness decay alerts, trending gaps

These dashboards are visible to every engineer. The insight loop only works if the people who can act on it — the engineers — can actually see it.

Example — typical multi-team setup:

rbac:
  defaultRole: "viewer"
  adminGroups: "platform-team"
  editorGroups: "docs-writers"

# Equivalent env vars
OIDC_DEFAULT_ROLE=viewer
OIDC_ADMIN_GROUPS=platform-team
OIDC_EDITOR_GROUPS=docs-writers

Note: Role is evaluated at login time. Group changes in your IdP take effect on next login.

ACL¶

Mirrors source-system permissions (Confluence space restrictions, Slack private channels, GitHub repo visibility, Jira issue security levels) at query time. A user only sees retrieval results for documents they can read in the source.

For the conceptual guide, modes, denial UX, audit log, and threat model, see Access Control (ACL). The reference below is the env-var / YAML surface only.

Top-level¶

Variable	Default	Description
`ACL_MODE`	`off`	`off` (no filtering), `warn` (log denials, return all), `enforce` (filter + redact)
`ACL_RECALL_OVERFETCH`	`2.0`	Recall multiplier — pull this much extra from the index so post-filter results still hit `top_k`
`ACL_UNKNOWN_POLICY`	`deny`	What to do with chunks that have no ACL data: `deny` (fail-closed) or `allow` (legacy / migration mode)

Per-source policy (`acl.sources.*`)¶

Each connector slot accepts mirror (default — use real source ACLs), public (everyone in the workspace can see all docs from this source), or admin_only.

acl:
  sources:
    confluence: mirror
    slack: mirror
    github: mirror
    jira: mirror
    gitlab: public        # if your GitLab MRs are intentionally workspace-wide
    ms_teams: admin_only  # restrict until ACL provider lands
    linear: mirror

Per-namespace overrides (per Confluence space, per Slack channel, etc.) live under acl.denial.source_overrides.<source>.{space,channel,repo,project}_overrides.

Denial UX (`acl.denial.*`)¶

Variable	Default	Description
`ACL_DENIAL_MODE`	`disclosed_no_count`	`silent` (no hint), `disclosed_no_count` (acknowledge, hide count), `disclosed` (full count + breakdown)
`ACL_DENIAL_REFERRAL`	unset	Optional URL shown in denial messages (e.g. your access-request portal)
`ACL_DENIAL_PARTIAL_DENIAL`	`true`	Surface `access` metadata even when some results were returned
`ACL_AUDIT_ENABLED`	`false`	Write denial events to `acl_audit_log` (required for HIPAA / FedRAMP / SOC2 trails)
`ACL_AUDIT_RAW_QUERY`	`false`	Store the raw user query (default: SHA256 hash only — queries can carry MNPI / PII)

Per-role overrides (admin sees full disclosure, employee sees no count) and per-source overrides are YAML-only:

acl:
  denial:
    mode: disclosed_no_count
    role_overrides:
      admin: disclosed
    source_overrides:
      confluence:
        mode: disclosed
      slack:
        mode: silent

Strictest-wins: if any one denied source resolves to silent, the whole response goes silent. This prevents side-channel leaks where a user learns which source restricted them.

Diagnostics¶

# What does ACL think this user can see?
GET /api/v1/me/acl

# Coverage report — how many indexed chunks have ACL principals attached?
SELECT source_type, COUNT(*) FROM document_acl GROUP BY source_type;

Documentation Analytics¶

Velocity & ROI variables¶

Variable	Default	Description
`VELOCITY_MINUTES_SAVED`	`15`	v1 only. Estimated minutes saved per deflected query (single point value).
`VELOCITY_HOURLY_RATE`	`75`	Effective hourly engineer cost (USD) used by both v1 and v2 ROI math.
`VELOCITY_ROI_V2_ENABLED`	`true`	Switch to v2 methodology (recommended for executive reporting). Set `false` to revert to v1.
`VELOCITY_ROI_MIN_MINUTES_LOW`	`5`	v2 only. Low end of the per-signal time-saved range, in minutes.
`VELOCITY_ROI_MIN_MINUTES_HIGH`	`25`	v2 only. High end of the per-signal time-saved range, in minutes.
`VELOCITY_ROI_MIN_DISTINCT_USERS`	`3`	v2 only. Minimum distinct non-admin users with positive feedback before a number is reported. Below this, the dashboard shows "Insufficient signal".
`VELOCITY_ROI_EXCLUDE_ADMIN`	`true`	v2 only. Exclude admin users from the ROI population (admins tend to vote on their own answers).
`VELOCITY_ROI_MAX_VOTES_PER_USER`	`10`	v2 only. Per-user cap on positive votes counted inside the window. Prevents one power-user from dominating the org-wide number.
`VELOCITY_TRIBAL_V2_ENABLED`	`true`	Switch to v2 tribal-knowledge methodology (real domain entities, configurable threshold, insufficient-signal gate). Set `false` for the legacy v1 formula.
`VELOCITY_TRIBAL_MAX_EXPERTS`	`2`	v2 only. Domains with ≤ this many distinct experts are counted as "tribal." Raise for larger orgs.
`VELOCITY_TRIBAL_MIN_DOMAINS`	`3`	v2 only. Minimum distinct domains with positive-feedback signal before the percentage is reported. Below this, the dashboard shows "Insufficient signal."
`VELOCITY_BULK_UPDATE_MULTIPLE`	`10.0`	Bulk re-ingest guard for net knowledge velocity. A week whose updated-doc count exceeds this multiple of the rolling weekly-update norm is treated as a bulk sweep (e.g. a full re-ingest) and capped to the norm, so it cannot inflate the velocity headline or flip the maintenance trend to "accelerating". Lower it on a corpus with very steady authoring to catch smaller sweeps; raise it if legitimate maintenance bursts are being mistaken for sweeps. Must be finite and `>= 1.0` — a value of 0, negative, or NaN collapses the bulk-sweep threshold to 0 (every week misclassified as a sweep) and is rejected at startup with a clear error.
`VELOCITY_SUBSTANTIVE_UPDATE_CEILING`	`2000`	Absolute ceiling on a single week's substantive (bulk-excluded) update contribution. Applied after the rolling-norm cap to guard the case where the entire history is inflated and the rolling median itself is poisoned. A genuine week of hand-authored doc updates does not exceed this. Must be `>= 1` — a negative value or 0 silently zeroes all substantive updates and is rejected at startup with a clear error.

Documentation ROI — how the number is calculated¶

The "Documentation ROI" card on the dashboard tells you, in dollars and hours, how much time the knowledge base has saved your team.

This is the number you'll quote in board meetings and budget reviews, so it has to be honest. This section explains, in plain language, how DocBrain calculates it, why each knob exists, and how to tune the knobs for your organisation. You do not need to be a developer to follow this.

The simple story¶

Every time someone asks DocBrain a question and gives the answer a 👍, that's one "useful answer". DocBrain assumes a useful answer saved that person some amount of time they would have spent searching, asking colleagues, or rediscovering something they once knew.

hours saved  = (number of useful answers) × (minutes saved per answer) ÷ 60
money saved  = (hours saved) × (engineer hourly cost)

That's it. The rest of this page is just about counting "useful answers" honestly and picking a sensible "minutes saved" number.

Why honest counting matters (the v1 problem)¶

The first version of DocBrain ROI (called v1) counted every 👍 equally. That sounds fair, but it produces misleading numbers in practice:

One enthusiastic person can dominate the count. If the administrator clicks 👍 35 times and 6 other users click 👍 once each, the total is 41. But the system only really helped 7 people — and the admin was rating their own work.
A small deployment looks the same as a large one. Whether 3 people gave feedback or 300, v1 just reports the number. There's no way to tell "this is enough data to trust" from "this is two enthusiastic people".

If you report $693 saved to your CFO and they ask "how many actual people benefited?" and the honest answer is "basically one" — that's a credibility problem.

How v2 fixes it (the recommended default)¶

The current version (v2, on by default) fixes the four ways v1 can mislead. Each fix is one of the knobs you can turn:

Need enough people before reporting anything. If fewer than VELOCITY_ROI_MIN_DISTINCT_USERS different people gave positive feedback (default: 3 people), the dashboard shows "Insufficient signal" instead of a number. It's better to say "we don't know yet" than to invent a number from too little data.
Don't count the admin's own 👍. When VELOCITY_ROI_EXCLUDE_ADMIN is on (default: on), votes from administrators are ignored. You shouldn't get credit for rating your own answers.
Cap how many 👍 one person can contribute. Even with the admin excluded, one super-enthusiastic user could click 👍 a hundred times. With VELOCITY_ROI_MAX_VOTES_PER_USER (default: 10), we only count their first 10 — the rest still help the system learn, they just don't keep inflating the ROI number.
Report a range, not a single number. Some questions save you 30 seconds (looking up an env var). Others save you an hour (avoiding a wrong deployment). We don't know which it was, so we report a range: "between 5 minutes and 25 minutes saved per useful answer" (defaults — both adjustable). This gives an honest band, not a fake-precise single dollar figure.

A worked example¶

Suppose your DocBrain has these positive votes in the last 90 days:

Alice (engineer): 12 👍
Bob (engineer): 8 👍
Carol (engineer): 3 👍
You (admin): 18 👍

With v2 defaults:

Step	Calculation	Result
Exclude admin	Drop your 18 votes	12 + 8 + 3 = 23
Cap each user at 10	Alice 12 → 10, Bob 8 → 8, Carol 3 → 3	10 + 8 + 3 = 21 signals
Distinct user check	3 non-admin users, need ≥ 3	✅ pass
Hours saved (low)	21 × 5 min ÷ 60	1.75 h
Hours saved (high)	21 × 25 min ÷ 60	8.75 h
Money saved (at $75/h)	1.75 × 75 to 8.75 × 75	$131 – $656

The dashboard shows: 1.75 – 8.75 h saved · ~$131 – $656 · 3 users · 21 signals.

For comparison, v1 would have shown: (12+8+3+18) × 15 / 60 = 10.25h × $75 = $769 — more than twice as high, but inflated by your own 18 votes and Alice's extra 2 (above the cap).

Which knob should I change?¶

This table tells you which environment variable to adjust for the situation you're in. You only need to set the ones you want to change — defaults work for most organisations.

Your situation	Knob to change	Suggested value
My engineers are expensive (FAANG, senior)	`VELOCITY_HOURLY_RATE`	Raise to `100`–`150`. Use loaded cost (salary + benefits + overhead), not just base salary.
My team is mostly junior / offshore	`VELOCITY_HOURLY_RATE`	Lower to `40`–`60`.
Most queries are quick lookups ("what's the staging URL?")	`VELOCITY_ROI_MIN_MINUTES_HIGH`	Lower to `10`. Don't claim 25 minutes saved on a 1-minute lookup.
Most queries are deep investigations (incident postmortems, architecture questions)	`VELOCITY_ROI_MIN_MINUTES_HIGH`	Raise to `45` or `60`.
I report this number to executives or customers	`VELOCITY_ROI_MIN_DISTINCT_USERS`	Raise to `10` so you have a more robust statistical base.
Tiny team (under 20 engineers total)	`VELOCITY_ROI_MIN_DISTINCT_USERS`	Keep at `3`. Lower is dishonest.
One or two power-users dominate adoption	`VELOCITY_ROI_MAX_VOTES_PER_USER`	Lower to `5`. Tighter cap = less skew.
Adoption is broad and even across the team	`VELOCITY_ROI_MAX_VOTES_PER_USER`	Raise to `20`. Caps rarely bind.
I want the old (inflated) number back	`VELOCITY_ROI_V2_ENABLED`	Set to `false`. v1 reactivates immediately. Not recommended.

Where to set these¶

In Helm (values.yaml):

velocity:
  hourlyRate: 100
  roiMinDistinctUsers: 10
  roiMaxVotesPerUser: 5

Or as environment variables (Docker / direct deploy):

export VELOCITY_HOURLY_RATE=100
export VELOCITY_ROI_MIN_DISTINCT_USERS=10
export VELOCITY_ROI_MAX_VOTES_PER_USER=5

What if v2 makes my number drop?¶

It probably will. That's the point — v1 was inflated. The v2 number is the one you can defend in a board meeting. Past snapshots are kept unchanged in the database; v2 only changes what the live dashboard shows. You can switch back to v1 at any time by setting VELOCITY_ROI_V2_ENABLED=false.

Tribal Knowledge — how the number is calculated¶

The "Tribal Knowledge" card tells you what share of your knowledge domains (Confluence spaces, Slack channels, GitHub repos) are dangerously concentrated — where only one or two people have the context to answer questions. A high number means key knowledge lives in a few people's heads; if they leave or go on vacation, work stalls.

This metric had the same v1 inflation problem as ROI:

The v1 problem¶

The original formula counted every user who gave positive feedback on a doc in that domain as an "expert." Two problems:

The admin was counted. When you (operating DocBrain) clicked 👍 on an answer in any domain, you registered as an expert in that domain. On a young deployment where you're the only feedback giver, every domain showed exactly one expert (you) — making 100% of domains "tribal" by the ≤ 2 threshold.
The threshold was hardcoded. "≤ 2 experts = tribal" is right for some orgs but absurd for others. A 5-person startup has tribal knowledge by definition (everyone wears many hats). A 500-person org probably wants ≥ 5 experts before considering a domain healthy.
No "insufficient signal" check. With only 2 domains showing any feedback, calling it "50% tribal" is meaningless — you'd need far more data to draw a conclusion. v1 showed the number anyway.

How v2 fixes it (the recommended default)¶

Two corrections:

Count experts from real knowledge domains. v2 reads the ownership substrate — real domain entities with their attributed contributors — instead of grouping feedback by raw source containers (a Confluence space or Slack channel masquerading as a "domain"). The expert count reflects genuine subject-matter ownership.
Make the threshold tunable. VELOCITY_TRIBAL_MAX_EXPERTS (default 2) sets the cutoff: domains with ≤ this many distinct experts are tribal. A small team might lower to 1; a large org might raise to 5.
Require enough domains to draw a conclusion. If fewer than VELOCITY_TRIBAL_MIN_DOMAINS domains have any positive-feedback signal (default 3), the dashboard shows "Insufficient signal" instead of a misleading percentage.

Which knob should I change?¶

Your situation	Knob	Suggested value
Small team (≤ 20 engineers)	`VELOCITY_TRIBAL_MAX_EXPERTS`	Keep at `2`. Tribal in small teams is normal but worth surfacing.
Large org (100+ engineers)	`VELOCITY_TRIBAL_MAX_EXPERTS`	Raise to `5`. Anything fewer than 5 active contributors is a bus-factor risk at scale.
Just rolled out DocBrain; only a handful of users	`VELOCITY_TRIBAL_MIN_DOMAINS`	Keep at `3`. Wait for adoption; "Insufficient signal" is the honest answer.
I want the old (inflated) number	`VELOCITY_TRIBAL_V2_ENABLED`	Set to `false`. Not recommended.

Where to set these¶

In Helm (values.yaml):

velocity:
  tribalMaxExpertsPerDomain: 5
  tribalMinDomainsWithSignal: 10

Or as environment variables:

export VELOCITY_TRIBAL_MAX_EXPERTS=5
export VELOCITY_TRIBAL_MIN_DOMAINS=10

Forecast Trend — how "Improving / Stable / Worsening" is decided¶

The dashboard's "Trend" label above Knowledge Health (homepage) classifies your gap-resolution velocity over the last 4 weeks. It reads from /api/v1/autopilot/forecast.

The v1 problem¶

The v1 formula reported a definitive verdict on any non-zero amount of data:

if avg_new == 0           → "stable"
ratio = avg_resolved / avg_new
ratio ≥ 0.75              → "improving"
ratio ≥ 0.40              → "stable"
otherwise                 → "worsening"

Two failure modes on real deployments:

Single-event fluke. One gap created last week, one resolved the same week → ratio = 1.0 → reported "improving" even though the sample is statistically meaningless.
"Stable" overloaded. Both "no gap activity at all" and "moderate resolution rate" map to "stable." Operators can't tell "healthy quiet corpus" from "we don't have enough data."

How v2 fixes it¶

Three corrections, mirroring ROI v2 and Tribal v2:

Insufficient-signal gate. When fewer than AUTOPILOT_TREND_MIN_EVENTS (default 5) total gap events (new
resolved) have occurred in the 4-week window, the dashboard shows "Trend: Insufficient signal" rather than guessing.
"No gaps open" as a distinct positive state. When the corpus has zero new gaps AND zero currently-open gaps in the window, that's actively healthy — reported as "Trend: No gaps open" (green), not the neutral "stable."
Configurable thresholds. The 0.75 and 0.40 cutoffs are now AUTOPILOT_TREND_IMPROVING_THRESHOLD and AUTOPILOT_TREND_WORSENING_THRESHOLD. A strict ops team might want improving ≥ 0.90; a lenient team ≥ 0.60.

Which knob should I change?¶

Your situation	Knob	Suggested value
Brand-new deployment; want to wait for real signal	`AUTOPILOT_TREND_MIN_EVENTS`	Keep at `5`. Lower to `3` if you want a verdict sooner.
Large org with high gap volume	`AUTOPILOT_TREND_MIN_EVENTS`	Raise to `20` so a few outlier weeks don't trigger early verdicts.
Strict definition of "improving"	`AUTOPILOT_TREND_IMPROVING_THRESHOLD`	Raise to `0.90`.
Generous "improving" definition	`AUTOPILOT_TREND_IMPROVING_THRESHOLD`	Lower to `0.60`.
I want the old (definitive-on-thin-data) formula	`AUTOPILOT_TREND_V2_ENABLED`	Set to `false`. Not recommended.

Where to set these¶

In Helm (values.yaml):

autopilot:
  trendMinEvents: 10
  trendImprovingThreshold: 0.80

Or as environment variables:

export AUTOPILOT_TREND_MIN_EVENTS=10
export AUTOPILOT_TREND_IMPROVING_THRESHOLD=0.80

Two "Trend" cards — what's the difference?¶

DocBrain shows trend labels in two places:

Home page "Gap Trend" — measures gap-cluster dynamics (autopilot's view of "are knowledge gaps growing or shrinking?"). Sources from the autopilot_gap_clusters table; tunable via AUTOPILOT_TREND_* env vars described in the section above.
/velocity "Maintenance Trend" — measures doc maintenance flow vs stale debt across the selected time window. Sources from the learning_velocity_snapshots table; tunable via the variable below.

The two can disagree honestly. Gaps can be quiet (no new questions that retrieval can't answer) while docs are quietly going stale, or vice versa. The labels are distinct so the operator never sees two unqualified "Trend:" verdicts that look contradictory.

Maintenance Trend — insufficient-signal gate¶

Variable	Default	Description
`VELOCITY_MAINTENANCE_TREND_MIN_SNAPSHOTS`	`4`	Minimum daily snapshots that carry any flow signal (docs created/updated, gaps opened/resolved > 0) before the Maintenance Trend reports an accelerating/stable/decelerating verdict. Below this, the card shows "Insufficient signal." Raise on noisy corpora; lower for tiny pilots.

Helm:

velocity:
  maintenanceTrendMinSnapshots: 7

Knowledge Stream¶

Variable	Default	Description
`STREAM_ENABLED`	`false`	Enable background knowledge stream emission
`STREAM_INTERVAL_MINUTES`	`30`	How often the stream background task runs
`STREAM_INCIDENT_WARNING_MIN_USERS`	`2`	Minimum unique users hitting an unanswered question to emit an incident warning
`STREAM_DECAY_THRESHOLD`	`0.5`	Freshness score below which a decay alert is emitted

Event Bus¶

The event bus is internal pub/sub infrastructure — always enabled, no opt-in required. Every significant action (document ingest, gap detection, draft generation, etc.) emits a typed event that subscribers can react to.

Variable	Default	Description
`EVENT_BUS_CAPACITY`	`4096`	Broadcast channel buffer size. Increase if subscribers lag under high event volume. Max: 65536.
`EVENT_LOG_RETENTION_DAYS`	`90`	Days to retain events in the `event_log` table before purging.

Admin API endpoints:

Method	Path	Description
`GET`	`/api/v1/events`	Query the persistent event log. Supports `?type=gap.detected&since=2026-03-01&limit=100&offset=0`.
`GET`	`/api/v1/events/stream`	SSE stream of real-time events. Max 10 concurrent connections.

Both endpoints require admin role.

Knowledge Fragments¶

Knowledge fragments are first-class units of knowledge — smaller than documents, richer than chunks. They capture decisions, facts, caveats, procedures, and context from PRs, commits, IDE annotations, conversations, CI/CD pipelines, and manual entry.

Fragments are routed by confidence score: high-confidence fragments are auto-indexed into search, medium-confidence go to a review queue, and low-confidence are auto-discarded.

Variable	Default	Description
`FRAGMENT_AUTO_INDEX_THRESHOLD`	`0.7`	Minimum confidence score to auto-index a fragment into OpenSearch.
`FRAGMENT_REVIEW_THRESHOLD`	`0.4`	Minimum confidence for the review queue. Fragments below this are auto-discarded.
`FRAGMENT_MAX_CONTENT_LENGTH`	`10000`	Maximum fragment content length in characters.

Fragment Clustering & Auto-Composition¶

Semantic clustering groups related fragments by topic using embedding similarity (DBSCAN-style greedy algorithm). When a cluster meets composability criteria (5+ fragments, diverse sources, 500+ words), it can be auto-composed into a documentation draft via the API.

Variable	Default	Description
`FRAGMENT_CLUSTERING_ENABLED`	`true`	Enable or disable the fragment clustering endpoint.
`FRAGMENT_CLUSTER_THRESHOLD`	`0.80`	Cosine similarity threshold for grouping fragments (0.60 = loose, 0.90 = strict).
`FRAGMENT_MIN_CLUSTER_SIZE`	`3`	Minimum fragments required to form a cluster.
`FRAGMENT_MIN_SOURCE_DIVERSITY`	`2`	Minimum distinct source types for a cluster to be composable.
`FRAGMENT_MAX_PER_CLUSTERING_RUN`	`2000`	Maximum fragments loaded per clustering run (memory/cost control).

CI/CD Pipeline Capture¶

Automated knowledge extraction from merged PRs and deployments. When enabled, DocBrain provides API endpoints that CI/CD pipelines can call to extract knowledge fragments from pull requests and deployment events. Uses the fast/cheap LLM model to keep costs low at high volume.

Variable	Default	Description
`CI_ANALYZE_ENABLED`	`true`	Enable or disable the CI/CD capture endpoints (`/api/v1/ci/analyze` and `/api/v1/ci/deploy-capture`).

See the API Reference for endpoint details and the GitHub Action setup guide.

Conversation Auto-Distillation¶

Automatically extracts structured knowledge fragments from captured conversations — Slack threads (via message shortcut, @DocBrain capture, or /docbrain capture) and GitHub PR discussions (via @docbrain capture). After a successful capture, DocBrain runs LLM-powered distillation in the background to identify decisions, facts, caveats, procedures, and context embedded in the conversation.

Distillation is fire-and-forget: it never affects capture response time. Failures are logged and metriced but don't block the capture path.

Variable	Default	Description
`DISTILLATION_ENABLED`	`true`	Enable or disable conversation auto-distillation.
`DISTILLATION_MAX_CONCURRENT`	`3`	Maximum concurrent LLM distillation calls (bounded by semaphore).
`DISTILLATION_MAX_CONTENT_CHARS`	`8000`	Maximum conversation characters sent to the LLM. Longer conversations are truncated (tail-biased — keeps the most recent messages).
`DISTILLATION_MAX_FRAGMENTS`	`5`	Maximum knowledge fragments extracted per conversation.

Governance SLA Checker¶

The SLA checker runs as a periodic background task that detects breaches across four entity types: gap acknowledgment, gap resolution, draft review, and document freshness. SLA thresholds are stored in the database (per-space overridable via the API) — these settings control the checker's operational behavior.

Variable	Default	Description
`SLA_CHECKER_INTERVAL_HOURS`	`1`	How often the SLA breach checker runs (hours).
`SLA_CHECKER_QUERY_TIMEOUT_SECS`	`30`	Per-entity-type query timeout in seconds.
`SLA_CHECKER_MAX_CANDIDATES`	`5000`	Maximum candidate entities scanned per type per run.
`SLA_CHECKER_MAX_EVENTS_PER_RUN`	`50`	Maximum `SlaBreached` events emitted per run (prevents webhook flooding).

See the API Reference — Governance SLAs for endpoint documentation.

Expertise Ownership Gate¶

The expertise scorer attributes ownership of a subject area to a team based on captured signals (questions answered, documents authored, reviews, etc.). Before it publishes a (subject, team) attribution, it must clear several thresholds; if any fails, it abstains rather than guess. The defaults are deliberately abstain-heavy (high precision over recall) so a fresh deployment does not surface low-confidence attributions.

Variable	Default	Description
`EXPERTISE_GATE_V_MIN`	`1.0`	Minimum decayed team score (volume gate).
`EXPERTISE_GATE_N_MIN`	`5`	Minimum raw signal count (volume gate).
`EXPERTISE_GATE_M_ASKERS`	`2`	Minimum number of distinct people who asked about the subject.
`EXPERTISE_GATE_MARGIN_FRAC`	`0.25`	Minimum fraction by which the leading team must beat the runner-up.
`EXPERTISE_GATE_DIVERSITY_MIN`	`2`	Minimum number of distinct signal types supporting the attribution.

UI accuracy gate¶

A second gate controls whether confident ownership attributions are shown to end users at all. Confident attributions surface only when the measured (audited) confidently-wrong rate is within the configured bar, the gate is explicitly enabled, and there is enough audited evidence to trust the rate. The gate is disabled by default, so a new deployment abstains in the UI until an operator proves accuracy and sets the bar from the measured risk-coverage curve.

Variable	Default	Description
`EXPERTISE_GATE_UI_ENABLED`	`false`	Master switch. When `false`, the UI always abstains on confident attributions.
`EXPERTISE_GATE_UI_CONFIDENTLY_WRONG_BAR`	`0.0`	Maximum audited confidently-wrong rate at which confident attributions may be shown. At the default `0.0`, only a measured 0% wrong rate clears the gate.
`EXPERTISE_GATE_UI_MIN_AUDIT_SAMPLES`	`30`	Minimum number of audited labels required before the gate can open. Insufficient evidence never clears the gate — "no evidence" is not "0% wrong".

Doc-Improvement Evidence Loop¶

The doc-improvement evidence chain reports how far each auto-published fix progressed along the proven path (published → content-changed → re-ingest-confirmed → human-approved → measured freshness/quality delta), with each link shown at its true strength rather than as a single "improved" flag.

The re-ingest-confirm timeout is load-bearing: a published fix whose re-ingest has not been confirmed live within this window is reported as "stale — published but never confirmed live" (signalling a downstream failure) rather than the hopeful "published, not yet confirmed live" (the normal in-flight state while the batch sync catches up). The default is long enough that a normal sync always lands first, so "stale" reliably indicates a real problem, not a slow pipeline.

Variable	Default	Description
`IMPROVEMENT_REINGEST_CONFIRM_TIMEOUT_HOURS`	`72`	Hours after publish, with no re-ingest confirmation, before a fix is reported "stale — published but never confirmed live" instead of "published, not yet confirmed live".

External Connectors (HTTP Connector Protocol)¶

External connectors are stateless HTTP servers that implement a simple REST contract (GET /health, POST /documents/list, POST /documents/fetch). DocBrain calls them on a configurable cron schedule to ingest documents from external systems. Connectors are registered and managed via the admin API.

The connector scheduler runs as a background task, polling every 60 seconds for connectors whose cron schedule is due. A circuit breaker automatically disables connectors after repeated failures.

Variable	Default	Description
`CONNECTOR_ENABLED`	`true`	Enable/disable the connector scheduler
`CONNECTOR_MAX_CONCURRENT_SYNCS`	`3`	Max connectors syncing simultaneously (1-20)
`CONNECTOR_MAX_PAGES_PER_SYNC`	`200`	Max list pages fetched per sync
`CONNECTOR_MAX_DOCUMENTS_PER_SYNC`	`5000`	Max documents ingested per sync
`CONNECTOR_FETCH_BATCH_SIZE`	`50`	Documents fetched per batch (1-200)
`CONNECTOR_REQUEST_TIMEOUT_SECS`	`30`	HTTP timeout for individual connector requests (5-300 seconds)
`CONNECTOR_SYNC_TIMEOUT_SECS`	`3600`	Overall sync timeout per connector (60-7200 seconds)
`CONNECTOR_MAX_RESPONSE_BYTES`	`10485760`	Max response body size from connector (10 MB)
`CONNECTOR_CIRCUIT_BREAKER_THRESHOLD`	`5`	Consecutive failures before auto-disabling a connector
`CONNECTOR_ALLOW_INTERNAL`	`false`	Allow connector URLs on private/internal IP addresses. Not recommended for production.

See the API Reference — Connectors for endpoint documentation and the connector protocol spec.

Webhooks (Outbound)¶

Outbound webhook subscriptions let you push DocBrain events to external systems — Slack bots, CI/CD pipelines, PagerDuty, custom dashboards, etc. DocBrain signs every delivery with HMAC-SHA256, retries with exponential backoff, and automatically disables subscriptions that fail repeatedly (circuit breaker).

Variable	Default	Description
`WEBHOOK_DELIVERY_TIMEOUT_SECONDS`	`10`	HTTP timeout per webhook delivery attempt (1-60 seconds)
`WEBHOOK_MAX_RETRIES`	`4`	Maximum delivery attempts before giving up (1-10)
`WEBHOOK_CIRCUIT_BREAKER_THRESHOLD`	`10`	Consecutive failures before auto-disabling a subscription (3-100)
`ALLOW_INTERNAL_WEBHOOKS`	`false`	Allow delivery to private/internal IP addresses (10.x, 172.16.x, 192.168.x). Not recommended for production.

See the API Reference — Webhooks for endpoint documentation and event types.

Style Rules Engine¶

The style rules engine provides configurable linting for documentation consistency. Rules are always enabled — no opt-in required. Rules are managed via the API (CRUD + YAML import/export) and stored in PostgreSQL.

Rules are scoped either globally (space = null) or per-space. When linting, global rules apply to all content, and space-specific rules override global rules with the same (rule_type, name) key.

Five default rules are seeded on first migration:

Rule	Type	Default Severity
`avoid-simple`	terminology	warning
`avoid-just`	terminology	warning
`max-heading-depth` (H4)	formatting	warning
`max-sentence-length` (40 words)	formatting	info
`require-intro`	structure	warning

API endpoints: See API Reference — Style Rules Engine for full endpoint documentation.

Layered policy + file-based puller: For the full model — global vs. space overrides, overridable vs. mandatory enforcement, and the .docbrain/style.md file-based puller that lets teams version-control their style policy in a source repo — see Style Policy. A working example file lives at examples/style/.docbrain/style.md in this repo.

There are no environment variables for the in-database style rules engine — all limits are compile-time constants. The file-based puller has one environment variable: POLICY_FILE_SYNC_INTERVAL_SECS (default 900, set to 0 to disable the scheduled-pull background job).

Configuration Reference¶

How Configuration Works¶

Loading Order (later = higher priority)¶

What Goes Where¶

Example config/local.yaml¶

YAML Config Structure¶

Custom Config Directory¶

Infrastructure¶

LLM Provider¶

Ollama: model selection and tuning¶

Embedding Provider¶

Switching Embedding Models¶

Retrieval Pipeline¶

Why it matters¶

Reranker (rerank.*)¶

Custom provider — plug-and-play for any rerank API¶

Pipeline knobs (rag.*)¶

Confidence-retry fallback — when to enable¶

Agentic tool loop — when to enable¶

Grounding floors — what lowering actually costs¶

Observability¶

Admin trace endpoint — ?trace=true¶

Rolling back¶

Document Ingestion¶

General¶

Local Files¶

Confluence¶

Ingestion sources — nested umbrella configuration¶

Selector grammar (GitHub & GitLab)¶

GitHub (code + pull requests)¶

GitLab (merge requests)¶

Slack (threads)¶

Jira (issues)¶

Linear (issues)¶

Rate Limiting¶

GitLab MR Capture Webhook¶

GitHub Capture Security¶

Confluence Webhooks (Real-Time Sync)¶

Image Extraction¶

Web UI / CORS¶

Auth / Sessions¶

MCP Tool Platform¶

Helm values¶

Dynamic tool discovery¶

Rootly on-call shim¶

Slack Integration (Optional)¶

Notifications (Optional)¶

Documentation Autopilot (Optional)¶

Draft Publishing¶

GitHub Publishing¶

GitLab Publishing¶

Per-Space Routing¶

Freshness Scoring¶

Event-Based Source Types¶

Excluding Documents from Freshness Reports¶

Quick recipe — exclude every doc tagged retrospective in Confluence¶

How detection works¶

Which lifecycle status to use¶

Reviewing what's been excluded¶

Semantic Quality Scoring¶

Capture Lifecycle¶

Memory Consolidation¶

RAG Pipeline¶

Chunking¶

OpenSearch Index Names¶

Data Retention¶

Self-Ingest (Optional)¶

SSO / OIDC (Enterprise)¶

GitLab OIDC¶

RBAC Role Assignment¶

What every engineer can see¶

ACL¶

Top-level¶

Per-source policy (acl.sources.*)¶

Denial UX (acl.denial.*)¶

Diagnostics¶

Documentation Analytics¶

Velocity & ROI variables¶

Documentation ROI — how the number is calculated¶

The simple story¶

Example `config/local.yaml`¶

Reranker (`rerank.*`)¶

Pipeline knobs (`rag.*`)¶

Admin trace endpoint — `?trace=true`¶

Quick recipe — exclude every doc tagged `retrospective` in Confluence¶

Per-source policy (`acl.sources.*`)¶

Denial UX (`acl.denial.*`)¶