RAG | HYVE Docs

Dump's RAG (Retrieval-Augmented Generation) interface lets you ask questions in natural language and get AI-generated answers grounded in your vault's content.

How It Works

Query Embedding

Your question is vectorized using Gemini text-embedding-004 to find semantically relevant content.

Vector Search

pgvector approximate nearest neighbor (ANN) search finds the most relevant items. Falls back to full-text search if vector search returns no results.

Context Building

Up to 4,000 characters of relevant content are assembled from the top matches, with source attribution.

Streaming Response

Gemini generates a response based on the context, streamed back via Server-Sent Events (SSE).

API

POST/api/rag

Request Body

{
  query: string  // Your question (required)
  limit?: number // Max context items (1-20, default 10)
}

Response (SSE Stream)

The response is a stream of Server-Sent Events:

data: {"status": "embedding"}
data: {"status": "searching"}
data: {"status": "thinking"}
data: {"text": "Based on your vault..."}
data: {"text": " the article mentions..."}
data: {"done": true, "sources": [...]}

Status Events

Status	Description
`embedding`	Vectorizing your question
`searching`	Finding relevant content via pgvector
`thinking`	Gemini is generating the answer

Source Attribution

The final done event includes sources used for the answer:

{
  "done": true,
  "sources": [
    {
      "id": "uuid",
      "title": "Article Title",
      "url": "https://example.com",
      "source_type": "article"
    }
  ]
}

System Prompt

The RAG assistant (HYVER) follows these rules:

Answers based only on provided vault context
Cites sources by title
Responds "Nao encontrei isso no vault" when the answer isn't in context
Responds in the same language as the question

Model Selection

Dump tries models in order of preference:

gemini-3.1-pro-preview
gemini-2.0-flash (fallback)

If both fail, the error is streamed back to the client.

On this page