Reeve
Features

Knowledge Base

Document ingestion, RAG pipeline, and semantic search for agent-accessible knowledge.

Knowledge Base

The Knowledge Base lets you ingest documents (PDFs, markdown, text) into a searchable vector store that agents can query. It powers RAG (Retrieval-Augmented Generation) — agents get relevant context from your documents when answering questions.

How It Works

Documents → Chunking → Embedding → SQLite Vector Store

                              Agent query → Semantic search → Context injection
  1. Ingest — Upload documents or point to files/URLs
  2. Chunk — Documents are split into semantic chunks
  3. Embed — Chunks are converted to vector embeddings
  4. Store — Vectors stored in SQLite with full-text search
  5. Search — Agents query the knowledge base; relevant chunks are injected into context

Agent Tools

Agents interact with the knowledge base through 4 tools:

knowledge_ingest

Add documents to the knowledge base:

knowledge({
  action: "ingest",
  source: "/path/to/document.pdf",
  metadata: {
    category: "product-specs",
    version: "2.0"
  }
})

Search for relevant documents:

knowledge({
  action: "search",
  query: "What is the pricing for enterprise plans?",
  limit: 5
})

Returns ranked chunks with relevance scores:

{
  "results": [
    {
      "content": "Enterprise plans start at $499/month...",
      "source": "pricing-guide.md",
      "score": 0.92,
      "metadata": { "category": "product-specs" }
    }
  ]
}

knowledge_list

List all ingested documents:

knowledge({ action: "list" })

knowledge_delete

Remove a document and its chunks:

knowledge({ action: "delete", documentId: "doc_abc123" })

Deletion cascades — removing a document deletes all associated chunks and embeddings.

Chunking Strategy

The semantic chunker splits documents intelligently:

  • Respects headers — Markdown headers create natural chunk boundaries
  • Paragraph-aware — Doesn't split mid-paragraph
  • Size limits — Chunks target ~500-1000 tokens for optimal retrieval
  • Overlap — Adjacent chunks overlap slightly to preserve context at boundaries

Deduplication

The ingestion pipeline deduplicates documents by content hash. Re-ingesting the same document updates metadata without creating duplicate chunks.

Storage

Knowledge is stored in SQLite at ~/.reeve/memory/memory.db alongside Semantic Memory. The schema includes:

  • documents table — Source files with metadata
  • chunks table — Document chunks with embeddings
  • Full-text search index — For keyword fallback

The Knowledge Base and Semantic Memory share the same SQLite database and search infrastructure, but serve different purposes. Knowledge Base stores external documents you upload. Semantic Memory stores agent-generated observations and learnings.

On this page