Architecture¶

System Overview¶

DocBrain is a Rust-based documentation intelligence platform that combines RAG (Retrieval-Augmented Generation) with a multi-tier memory system, freshness scoring, intent-adaptive responses, and an autonomous Documentation Autopilot that identifies gaps and generates draft content.

┌─────────────┐     ┌──────────────────────────┐     ┌───────────────┐
│   Web UI    │────>│       API Server          │────>│  LLM Provider │
│  (Next.js)  │     │   (Rust / Axum)           │     │  (pluggable)  │
└─────────────┘     └────────────┬──────────────┘     └───────────────┘
                                 │
                    ┌────────────┼────────────┐
                    │            │            │
              ┌─────┴────┐ ┌────┴─────┐ ┌───┴────┐
              │PostgreSQL│ │OpenSearch│ │ Redis  │
              │(memory + │ │(vectors +│ │(cache +│
              │ autopilot│ │  search) │ │session)│
              │ + scores)│ │          │ │        │
              └──────────┘ └──────────┘ └────────┘

Core Components¶

API Server (`docbrain-server`)¶

Axum-based HTTP server exposing the REST API. Handles authentication (API keys with RBAC), rate limiting, SSE streaming, and routes requests through the RAG pipeline. Serves both the Q&A endpoints and the Autopilot management API.

RAG Pipeline (`docbrain-core`)¶

The intelligence layer. For each query:

Intent Classification — Determines query type (factual, procedural, troubleshooting, conceptual, comparative)
Query Rewriting — Reformulates for better retrieval using conversation context
Hybrid Search — Combines k-NN vector similarity with BM25 keyword matching
Memory Enrichment — Augments context from the 4-tier memory system
Reference Enrichment — Fetches chunks from cross-referenced documents (linked PRs, Jira tickets, Confluence pages) to provide broader context
Response Generation — LLM synthesizes an answer with source attribution
Caching — Semantic cache for repeated questions

Documentation Autopilot (`docbrain-core/autopilot`)¶

The autonomous documentation improvement engine. Runs on a daily schedule:

Gap Analyzer — Scans episodic memory for unanswered queries, negative feedback, and low-confidence answers from the past 30 days. Embeds the queries and clusters them using greedy cosine similarity (threshold: 0.82). Each cluster represents a documentation gap. Severity is calculated from query volume (critical: 25+, high: 11-25, medium: 4-10, low: 1-3).
Doc Drafter — Takes a gap cluster and generates a draft document. Uses sample queries from the cluster to search existing docs for partial context. Classifies the needed content type (runbook, FAQ, guide, troubleshooting, reference) via LLM, then generates a full draft with proper formatting and source attribution.
Digest Builder — Compiles weekly documentation health reports combining: query volume, unanswered rate, top gap clusters, new drafts, and stale document count. Formats as Slack Block Kit messages for team delivery.

Ingestion Pipeline (`docbrain-ingest`)¶

Fetches documents from configured sources (Confluence, GitHub, local files), converts to Markdown, chunks with heading-aware splitting, generates embeddings, and indexes in OpenSearch. During ingestion, cross-document references (URLs to GitHub PRs, GitLab MRs, Jira tickets, Confluence pages, etc.) are automatically extracted from content and stored as a reference graph in PostgreSQL. Referenced document IDs are attached to each chunk in OpenSearch for enrichment at query time.

MCP Server (`docbrain-mcp`)¶

Model Context Protocol server for integration with AI coding tools (Claude Code, Cursor). Includes IDE capture tools for annotating code, capturing commit intent, and surfacing documentation gaps directly in the editor.

CLI (`docbrain-cli`)¶

Command-line client for interactive Q&A sessions.

4-Tier Memory System¶

Tier	Storage	Purpose	TTL
Working	Redis	Current conversation context	Session-scoped
Episodic	PostgreSQL + OpenSearch	Past Q&A episodes, feedback	Permanent
Semantic	PostgreSQL	Entity graph (services, teams, concepts)	Permanent
Procedural	PostgreSQL	Learned rules from feedback patterns	Permanent

How Memory Works¶

Working memory maintains conversation state across turns within a session
Episodic memory finds similar past questions and their validated answers
Semantic memory resolves entity references ("the auth service" -> specific service with known dependencies)
Procedural memory applies learned rules (e.g., "when asked about deployments, always mention the canary process")

How Memory Feeds Autopilot¶

Episodic memory is the primary data source for gap detection. Every query that receives negative feedback, a not_found resolution, or a confidence score below 0.4 becomes a candidate for gap analysis. This creates a closed loop: user questions that expose documentation gaps are automatically surfaced and addressed.

5-Signal Freshness Scoring¶

Each document receives a freshness score (0-100) based on:

Time Decay (30%) — How recently the document was edited
Engagement (20%) — View count, query frequency, feedback ratio
Content Currency (20%) — LLM analysis of temporal language ("as of Q1 2024")
Link Health (15%) — Percentage of working links
Contradiction Detection (15%) — Cross-document consistency analysis

Documents are classified as: Fresh (80-100), Review (60-79), Stale (40-59), Outdated (<40).

Freshness scores integrate with Autopilot: stale documents that also have high query volume are surfaced as high-priority gaps with draft updates.

Intent-Adaptive Responses¶

Different query types receive tailored response formats:

Intent	Response Style
Factual	Direct answer with source citation
Procedural	Numbered step-by-step instructions
Troubleshooting	Diagnostic tree with common causes
Conceptual	Explanation with analogies and context
Comparative	Structured comparison with trade-offs

Data Flow¶

Q&A Pipeline¶

User Question
    │
    ▼
Intent Classification ──> Query Rewriting
    │                          │
    ▼                          ▼
Memory Lookup              Hybrid Search
(episodic, semantic,       (OpenSearch: k-NN + BM25)
 procedural)                   │
    │                          │
    └──────────┬───────────────┘
               │
               ▼
        Context Assembly
        (ranked chunks + memory)
               │
               ▼
        Reference Enrichment
        (fetch chunks from linked docs)
               │
               ▼
        LLM Generation
        (streaming SSE)
               │
               ▼
        Episode Storage
        (for future memory + gap analysis)

Autopilot Pipeline¶

Episodic Memory (30-day window)
    │
    ▼
Filter: negative feedback, not_found, low confidence
    │
    ▼
Embed queries ──> Greedy cosine clustering (threshold: 0.82)
    │
    ▼
Label clusters via LLM ──> Persist to autopilot_gap_clusters
    │
    ▼
On demand: Generate draft ──> Search existing docs for context
    │                          │
    ▼                          ▼
Classify content type       Assemble context from related docs
    │                          │
    └──────────┬───────────────┘
               │
               ▼
        LLM Draft Generation
               │
               ▼
        Review ──> Publish ──> Re-ingest ──> Better answers

Database Schema (Autopilot)¶

Table	Purpose
`autopilot_gap_clusters`	Detected documentation gaps with severity, sample queries, and status
`autopilot_drafts`	Generated draft documents linked to gap clusters
`autopilot_digests`	Weekly digest send history for deduplication
`document_references`	Cross-document reference graph (source → target edges with type and URL)