Architecture¶
System Overview¶
DocBrain is a Rust-based documentation intelligence platform that combines RAG (Retrieval-Augmented Generation) with a multi-tier memory system, freshness scoring, intent-adaptive responses, and an autonomous Documentation Autopilot that identifies gaps and generates draft content.
┌─────────────┐ ┌──────────────────────────┐ ┌───────────────┐
│ Web UI │────>│ API Server │────>│ LLM Provider │
│ (Next.js) │ │ (Rust / Axum) │ │ (pluggable) │
└─────────────┘ └────────────┬──────────────┘ └───────────────┘
│
┌────────────┼────────────┐
│ │ │
┌─────┴────┐ ┌────┴─────┐ ┌───┴────┐
│PostgreSQL│ │OpenSearch│ │ Redis │
│(memory + │ │(vectors +│ │(cache +│
│ autopilot│ │ search) │ │session)│
│ + scores)│ │ │ │ │
└──────────┘ └──────────┘ └────────┘
Core Components¶
API Server (docbrain-server)¶
Axum-based HTTP server exposing the REST API. Handles authentication (API keys with RBAC), rate limiting, SSE streaming, and routes requests through the RAG pipeline. Serves both the Q&A endpoints and the Autopilot management API.
RAG Pipeline (docbrain-core)¶
The intelligence layer. For each query:
- Intent Classification — Determines query type (factual, procedural, troubleshooting, conceptual, comparative)
- Query Rewriting — Reformulates for better retrieval using conversation context
- Hybrid Search — Combines k-NN vector similarity with BM25 keyword matching
- Memory Enrichment — Augments context from the 4-tier memory system
- Reference Enrichment — Fetches chunks from cross-referenced documents (linked PRs, Jira tickets, Confluence pages) to provide broader context
- Response Generation — LLM synthesizes an answer with source attribution
- Caching — Semantic cache for repeated questions
Documentation Autopilot (docbrain-core/autopilot)¶
The autonomous documentation improvement engine. Runs on a daily schedule:
-
Gap Analyzer — Scans episodic memory for unanswered queries, negative feedback, and low-confidence answers from the past 30 days. Embeds the queries and clusters them using greedy cosine similarity (threshold: 0.82). Each cluster represents a documentation gap. Severity is calculated from query volume (critical: 25+, high: 11-25, medium: 4-10, low: 1-3).
-
Doc Drafter — Takes a gap cluster and generates a draft document. Uses sample queries from the cluster to search existing docs for partial context. Classifies the needed content type (runbook, FAQ, guide, troubleshooting, reference) via LLM, then generates a full draft with proper formatting and source attribution.
-
Digest Builder — Compiles weekly documentation health reports combining: query volume, unanswered rate, top gap clusters, new drafts, and stale document count. Formats as Slack Block Kit messages for team delivery.
Ingestion Pipeline (docbrain-ingest)¶
Fetches documents from configured sources (Confluence, GitHub, local files), converts to Markdown, chunks with heading-aware splitting, generates embeddings, and indexes in OpenSearch. During ingestion, cross-document references (URLs to GitHub PRs, GitLab MRs, Jira tickets, Confluence pages, etc.) are automatically extracted from content and stored as a reference graph in PostgreSQL. Referenced document IDs are attached to each chunk in OpenSearch for enrichment at query time.
MCP Server (docbrain-mcp)¶
Model Context Protocol server for integration with AI coding tools (Claude Code, Cursor). Includes IDE capture tools for annotating code, capturing commit intent, and surfacing documentation gaps directly in the editor.
CLI (docbrain-cli)¶
Command-line client for interactive Q&A sessions.
4-Tier Memory System¶
| Tier | Storage | Purpose | TTL |
|---|---|---|---|
| Working | Redis | Current conversation context | Session-scoped |
| Episodic | PostgreSQL + OpenSearch | Past Q&A episodes, feedback | Permanent |
| Semantic | PostgreSQL | Entity graph (services, teams, concepts) | Permanent |
| Procedural | PostgreSQL | Learned rules from feedback patterns | Permanent |
How Memory Works¶
- Working memory maintains conversation state across turns within a session
- Episodic memory finds similar past questions and their validated answers
- Semantic memory resolves entity references ("the auth service" -> specific service with known dependencies)
- Procedural memory applies learned rules (e.g., "when asked about deployments, always mention the canary process")
How Memory Feeds Autopilot¶
Episodic memory is the primary data source for gap detection. Every query that receives negative feedback, a not_found resolution, or a confidence score below 0.4 becomes a candidate for gap analysis. This creates a closed loop: user questions that expose documentation gaps are automatically surfaced and addressed.
5-Signal Freshness Scoring¶
Each document receives a freshness score (0-100) based on:
- Time Decay (30%) — How recently the document was edited
- Engagement (20%) — View count, query frequency, feedback ratio
- Content Currency (20%) — LLM analysis of temporal language ("as of Q1 2024")
- Link Health (15%) — Percentage of working links
- Contradiction Detection (15%) — Cross-document consistency analysis
Documents are classified as: Fresh (80-100), Review (60-79), Stale (40-59), Outdated (<40).
Freshness scores integrate with Autopilot: stale documents that also have high query volume are surfaced as high-priority gaps with draft updates.
Intent-Adaptive Responses¶
Different query types receive tailored response formats:
| Intent | Response Style |
|---|---|
| Factual | Direct answer with source citation |
| Procedural | Numbered step-by-step instructions |
| Troubleshooting | Diagnostic tree with common causes |
| Conceptual | Explanation with analogies and context |
| Comparative | Structured comparison with trade-offs |
Data Flow¶
Q&A Pipeline¶
User Question
│
▼
Intent Classification ──> Query Rewriting
│ │
▼ ▼
Memory Lookup Hybrid Search
(episodic, semantic, (OpenSearch: k-NN + BM25)
procedural) │
│ │
└──────────┬───────────────┘
│
▼
Context Assembly
(ranked chunks + memory)
│
▼
Reference Enrichment
(fetch chunks from linked docs)
│
▼
LLM Generation
(streaming SSE)
│
▼
Episode Storage
(for future memory + gap analysis)
Autopilot Pipeline¶
Episodic Memory (30-day window)
│
▼
Filter: negative feedback, not_found, low confidence
│
▼
Embed queries ──> Greedy cosine clustering (threshold: 0.82)
│
▼
Label clusters via LLM ──> Persist to autopilot_gap_clusters
│
▼
On demand: Generate draft ──> Search existing docs for context
│ │
▼ ▼
Classify content type Assemble context from related docs
│ │
└──────────┬───────────────┘
│
▼
LLM Draft Generation
│
▼
Review ──> Publish ──> Re-ingest ──> Better answers
Database Schema (Autopilot)¶
| Table | Purpose |
|---|---|
autopilot_gap_clusters |
Detected documentation gaps with severity, sample queries, and status |
autopilot_drafts |
Generated draft documents linked to gap clusters |
autopilot_digests |
Weekly digest send history for deduplication |
document_references |
Cross-document reference graph (source → target edges with type and URL) |