Knowledge Base
Document ingestion, RAG pipeline, and semantic search for agent-accessible knowledge.
Knowledge Base
The Knowledge Base lets you ingest documents (PDFs, markdown, text) into a searchable vector store that agents can query. It powers RAG (Retrieval-Augmented Generation) — agents get relevant context from your documents when answering questions.
How It Works
Documents → Chunking → Embedding → SQLite Vector Store
│
Agent query → Semantic search → Context injection- Ingest — Upload documents or point to files/URLs
- Chunk — Documents are split into semantic chunks
- Embed — Chunks are converted to vector embeddings
- Store — Vectors stored in SQLite with full-text search
- Search — Agents query the knowledge base; relevant chunks are injected into context
Agent Tools
Agents interact with the knowledge base through 4 tools:
knowledge_ingest
Add documents to the knowledge base:
knowledge({
action: "ingest",
source: "/path/to/document.pdf",
metadata: {
category: "product-specs",
version: "2.0"
}
})knowledge_search
Search for relevant documents:
knowledge({
action: "search",
query: "What is the pricing for enterprise plans?",
limit: 5
})Returns ranked chunks with relevance scores:
{
"results": [
{
"content": "Enterprise plans start at $499/month...",
"source": "pricing-guide.md",
"score": 0.92,
"metadata": { "category": "product-specs" }
}
]
}knowledge_list
List all ingested documents:
knowledge({ action: "list" })knowledge_delete
Remove a document and its chunks:
knowledge({ action: "delete", documentId: "doc_abc123" })Deletion cascades — removing a document deletes all associated chunks and embeddings.
Chunking Strategy
The semantic chunker splits documents intelligently:
- Respects headers — Markdown headers create natural chunk boundaries
- Paragraph-aware — Doesn't split mid-paragraph
- Size limits — Chunks target ~500-1000 tokens for optimal retrieval
- Overlap — Adjacent chunks overlap slightly to preserve context at boundaries
Deduplication
The ingestion pipeline deduplicates documents by content hash. Re-ingesting the same document updates metadata without creating duplicate chunks.
Storage
Knowledge is stored in SQLite at ~/.reeve/memory/memory.db alongside Semantic Memory. The schema includes:
- documents table — Source files with metadata
- chunks table — Document chunks with embeddings
- Full-text search index — For keyword fallback
The Knowledge Base and Semantic Memory share the same SQLite database and search infrastructure, but serve different purposes. Knowledge Base stores external documents you upload. Semantic Memory stores agent-generated observations and learnings.