Dump's RAG (Retrieval-Augmented Generation) interface lets you ask questions in natural language and get AI-generated answers grounded in your vault's content.
How It Works
Query Embedding
Your question is vectorized using Gemini text-embedding-004 to find semantically relevant content.
Vector Search
pgvector approximate nearest neighbor (ANN) search finds the most relevant items. Falls back to full-text search if vector search returns no results.
Context Building
Up to 4,000 characters of relevant content are assembled from the top matches, with source attribution.
Streaming Response
Gemini generates a response based on the context, streamed back via Server-Sent Events (SSE).
API
POST/api/rag
Request Body
{
query: string // Your question (required)
limit?: number // Max context items (1-20, default 10)
}Response (SSE Stream)
The response is a stream of Server-Sent Events:
data: {"status": "embedding"}
data: {"status": "searching"}
data: {"status": "thinking"}
data: {"text": "Based on your vault..."}
data: {"text": " the article mentions..."}
data: {"done": true, "sources": [...]}Status Events
| Status | Description |
|---|---|
embedding | Vectorizing your question |
searching | Finding relevant content via pgvector |
thinking | Gemini is generating the answer |
Source Attribution
The final done event includes sources used for the answer:
{
"done": true,
"sources": [
{
"id": "uuid",
"title": "Article Title",
"url": "https://example.com",
"source_type": "article"
}
]
}System Prompt
The RAG assistant (HYVER) follows these rules:
- Answers based only on provided vault context
- Cites sources by title
- Responds "Nao encontrei isso no vault" when the answer isn't in context
- Responds in the same language as the question
Model Selection
Dump tries models in order of preference:
gemini-3.1-pro-previewgemini-2.0-flash(fallback)
If both fail, the error is streamed back to the client.