GitLab MR Capture¶
GitLab MR Capture lets DocBrain learn directly from your engineering work in real time. When a team member comments @docbrain capture on a merge request, DocBrain ingests the MR title, description, and full discussion thread — turning institutional knowledge that lives in GitLab into searchable, citable documentation.
Beyond capture, DocBrain can also answer questions inline: comment @docbrain <question> on any MR and DocBrain will reply with a RAG-generated answer sourced from your knowledge base.
How It Works¶
1. Team member comments "@docbrain capture" on a GitLab MR
│
▼
2. GitLab fires a webhook → POST /api/v1/gitlab/events
│
▼
3. DocBrain verifies X-Gitlab-Token, checks user/project allowlists
│
▼
4. Fetches MR notes via GitLab API, builds structured document:
Title: "GitLab MR !42: Fix OAuth redirect (group/project)"
Content: description + thread of human comments
URL: https://gitlab.com/group/project/-/merge_requests/42
│
▼
5. Embeds + indexes all content (same pipeline as Confluence/Slack)
│
▼
6. Posts ✅ confirmation note on the MR with chunk count
│
▼
7. Content immediately available in Q&A and Autopilot gap analysis
What Gets Captured¶
- MR title
- MR description (markdown)
- All human comment notes (system notes are excluded)
- Cross-document references — URLs in the description and notes are automatically extracted and classified (GitHub PRs, Jira tickets, Confluence pages, other GitLab MRs/issues, etc.). GitLab shorthand references (
!123for MRs,#123for issues) are resolved to full URLs within the same project. Referenced documents are linked in a reference graph and enriched at query time. - Stored as
source_type = "gitlab_capture"with metadata:project,mr_iid,mr_state,type: "merge_request"
Content Limits¶
- Maximum 500 KB — oversized MRs are skipped with a warning log
- MRs with no content at all (no title, no description, no notes, no changes) are silently skipped
Setup¶
Step 1 — Set Environment Variables¶
# Required: at least one of these must be set
GITLAB_CAPTURE_WEBHOOK_SECRET=<your-webhook-secret> # validates X-Gitlab-Token header
GITLAB_CAPTURE_TOKEN=<gitlab-pat> # PAT with api scope (for fetching MR notes)
# Optional
GITLAB_CAPTURE_BASE_URL=https://gitlab.com # default; change for self-hosted GitLab
GITLAB_CAPTURE_TLS_INSECURE=false # set true for self-signed certs (self-hosted GitLab)
GITLAB_CAPTURE_ALLOWED_USERS=alice,bob # only process commands from these usernames
GITLAB_CAPTURE_ALLOWED_PROJECTS=group/repo,org/app # only process these projects
GITLAB_CAPTURE_TOKEN (PAT) is needed to fetch the full MR note thread via the GitLab API. Without it, DocBrain can only read the triggering note body, not the full discussion.
Required PAT scopes: api (read MR notes and post replies).
Step 2 — Configure the GitLab Webhook¶
- Go to your GitLab project (or group for org-wide capture) → Settings → Webhooks
- URL:
- Secret token: value of
GITLAB_CAPTURE_WEBHOOK_SECRET - Trigger: check Note events (comments on MRs)
- SSL verification: enabled (recommended)
- Click Add webhook
Test it: click Test → Note events in the webhook list. DocBrain will respond 200 (the test payload has no @docbrain mention so nothing gets captured, but you'll confirm connectivity).
Step 3 — Restart DocBrain¶
DocBrain reads the env vars at startup:
If you see NOT_FOUND on the webhook, GITLAB_CAPTURE_WEBHOOK_SECRET or GITLAB_CAPTURE_TOKEN is not set.
Using Capture¶
Capture an MR¶
Comment on any open or merged MR:
DocBrain replies within seconds:
✅ Captured by DocBrain — 7 chunks indexed and immediately searchable.
This MR will feed Autopilot's next gap analysis run.
The MR content is now searchable via the Q&A API and the web UI.
Ask a Question on an MR¶
DocBrain replies with a sourced answer:
DocBrain — answering: How does the new auth middleware handle token refresh?
The middleware checks the token expiry window (configurable via TOKEN_REFRESH_WINDOW_SECS)...
Sources: [Auth Architecture](https://...) · [Confluence: Auth Design] · Confidence: 87%
Commands Summary¶
| Comment | Effect |
|---|---|
@docbrain capture |
Index the full MR thread into DocBrain |
@docbrain <question> |
Run RAG on your knowledge base and post the answer |
Both triggers are case-insensitive. The @docbrain prefix must appear somewhere in the note.
Access Control¶
User Allowlist¶
By default, any commenter can trigger capture. To restrict to specific users:
Anyone not in this list who comments @docbrain capture will be silently ignored (DocBrain logs a INFO message, no error reply to GitLab).
Project Allowlist¶
To limit capture to specific projects:
Use group/project format as shown in the GitLab URL. Case-insensitive.
Both allowlists can be combined — the request must pass both checks.
Self-Hosted GitLab¶
Set GITLAB_CAPTURE_BASE_URL to your instance URL:
If your instance uses a self-signed or internal CA certificate, disable TLS verification for DocBrain's API calls back to GitLab:
Helm:
Also uncheck "Enable SSL verification" on the GitLab webhook itself (Project → Settings → Webhooks → Edit) so GitLab can reach DocBrain.
⚠️ Disabling TLS verification exposes API tokens to interception. Use only in isolated/private networks.
Batch Ingest (Without Webhooks)¶
To backfill existing MRs without waiting for new comments, use the batch ingest source:
# .env or environment
GITLAB_URL=https://gitlab.com
GITLAB_TOKEN=<pat-with-api-scope>
GITLAB_GROUPS=engineering,platform # comma-separated group names
GITLAB_MR_STATE=merged # open | merged | closed | all
GITLAB_MR_LOOKBACK_DAYS=90 # how far back to scan
Then trigger a manual ingest:
Batch-ingested MRs are stored as source_type = "gitlab_mr" (distinct from "gitlab_capture" for webhook-triggered capture).
Helm Chart Configuration¶
# values.yaml
gitlabCapture:
enabled: true
baseUrl: "https://gitlab.com" # or your self-hosted URL
tlsInsecure: false # set true for self-signed certs
allowedUsers: "" # comma-separated usernames; empty = allow all
allowedProjects: "" # comma-separated group/project paths; empty = allow all
# Secrets — set via --set or external secret
webhookSecret: "" # GITLAB_CAPTURE_WEBHOOK_SECRET
token: "" # GITLAB_CAPTURE_TOKEN (PAT with api scope)
Or set secrets separately via --set:
helm upgrade docbrain ./helm/docbrain \
--set gitlabCapture.enabled=true \
--set gitlabCapture.webhookSecret="$(openssl rand -hex 32)" \
--set gitlabCapture.token="glpat-xxxx"
Alternatively, manage secrets externally:
kubectl create secret generic docbrain-gitlab \
--from-literal=GITLAB_CAPTURE_WEBHOOK_SECRET=<secret> \
--from-literal=GITLAB_CAPTURE_TOKEN=<pat>
Then reference it in your values.yaml:
Monitoring¶
Logs to Watch¶
| Log message | Meaning |
|---|---|
GitLab capture webhook enabled |
Startup: capture is active |
GitLab capture: MR !N in group/project — X chunks indexed |
Successful capture |
GitLab webhook: invalid or missing X-Gitlab-Token |
Webhook secret mismatch |
GitLab capture: MR !N in group/project exceeds 500KB |
MR too large, skipped |
GitLab capture: user 'X' not in GITLAB_CAPTURE_ALLOWED_USERS |
Access control filtered |
Verify a Captured MR¶
# Check the document exists
GET /api/v1/admin/documents?source_type=gitlab_capture
# Or search
POST /api/v1/ask
{ "question": "content from the MR you captured" }
FAQ¶
Q: DocBrain replied with ✅ but I can't find the content in search.
The content is indexed immediately but embedding propagation to OpenSearch can take a few seconds. Wait 5–10 seconds and retry. If it's still missing, check the docbrain-server logs for embedding errors.
Q: No reply was posted on the MR at all.
GITLAB_CAPTURE_TOKEN is likely not set. Without a PAT, DocBrain cannot call the GitLab API to post notes. Check the logs for GitLab capture: ingest failed or not in GITLAB_CAPTURE_ALLOWED_USERS.
Q: Can I capture MRs across an entire group (all projects)?
Yes — configure the webhook at the Group level in GitLab: Group → Settings → Webhooks. Use the same URL and secret. Leave GITLAB_CAPTURE_ALLOWED_PROJECTS empty to accept from all projects in the group.
Q: Does capture work on closed/merged MRs?
Yes. The @docbrain capture trigger works regardless of MR state. The MR state is stored in metadata and available in search results.
Q: How is this different from the GitLab batch ingest?
| Webhook capture | Batch ingest | |
|---|---|---|
| Trigger | @docbrain capture comment |
Scheduled or manual |
| Latency | Real-time (seconds) | Next ingest cycle |
| Scope | One MR at a time | All MRs matching filters |
| Source type | gitlab_capture |
gitlab_mr |
| Requires PAT | Yes (for notes API) | Yes |
| Requires webhook | Yes | No |
Use batch ingest to backfill history; use webhook capture for ongoing real-time knowledge capture.