Document ingestion, RAG pipeline, and semantic search for agent-accessible knowledge.

Knowledge Base

The Knowledge Base lets you ingest documents (PDFs, markdown, text) into a searchable vector store that agents can query. It powers RAG (Retrieval-Augmented Generation) — agents get relevant context from your documents when answering questions.

How It Works

Documents → Chunking → Embedding → SQLite Vector Store
                                          │
                              Agent query → Semantic search → Context injection

Ingest — Upload documents or point to files/URLs
Chunk — Documents are split into semantic chunks
Embed — Chunks are converted to vector embeddings
Store — Vectors stored in SQLite with full-text search
Search — Agents query the knowledge base; relevant chunks are injected into context

Agent Tools

Agents interact with the knowledge base through 4 tools:

`knowledge_ingest`

Add documents to the knowledge base:

knowledge({
  action: "ingest",
  source: "/path/to/document.pdf",
  metadata: {
    category: "product-specs",
    version: "2.0"
  }
})

`knowledge_search`

Search for relevant documents:

knowledge({
  action: "search",
  query: "What is the pricing for enterprise plans?",
  limit: 5
})

Returns ranked chunks with relevance scores:

{
  "results": [
    {
      "content": "Enterprise plans start at $499/month...",
      "source": "pricing-guide.md",
      "score": 0.92,
      "metadata": { "category": "product-specs" }
    }
  ]
}

`knowledge_list`

List all ingested documents:

knowledge({ action: "list" })

`knowledge_delete`

Remove a document and its chunks:

knowledge({ action: "delete", documentId: "doc_abc123" })

Deletion cascades — removing a document deletes all associated chunks and embeddings.

Chunking Strategy

The semantic chunker splits documents intelligently:

Respects headers — Markdown headers create natural chunk boundaries
Paragraph-aware — Doesn't split mid-paragraph
Size limits — Chunks target ~500-1000 tokens for optimal retrieval
Overlap — Adjacent chunks overlap slightly to preserve context at boundaries

Deduplication

The ingestion pipeline deduplicates documents by content hash. Re-ingesting the same document updates metadata without creating duplicate chunks.

Storage

Knowledge is stored in SQLite at ~/.reeve/memory/memory.db alongside Semantic Memory. The schema includes:

documents table — Source files with metadata
chunks table — Document chunks with embeddings
Full-text search index — For keyword fallback

The Knowledge Base and Semantic Memory share the same SQLite database and search infrastructure, but serve different purposes. Knowledge Base stores external documents you upload. Semantic Memory stores agent-generated observations and learnings.

Knowledge Base

On this page