Haiku RAG

Agentic RAG built on LanceDB, Pydantic AI, and Docling.

New: vision and multimodal search. Picture-aware ingestion captures embedded figure bytes; vision-capable QA models receive them alongside text. Multimodal embedders put picture vectors in the same space as text, enabling text-as-query → figure hits and image-as-query retrieval.

Features

Hybrid search — Vector + full-text with Reciprocal Rank Fusion
Multimodal & cross-modal search — Multimodal embedders (vLLM, VoyageAI, Cohere) put picture vectors in the same space as text; supports text-as-query → figure hits and image-as-query
Question answering — RAG skill with citations (page numbers, section headings)
Vision QA — Vision-capable models receive figure bytes alongside chunk text
Reranking — local cross-encoders, Cohere, Zero Entropy, or vLLM
Analysis skill — Complex analytical tasks via sandboxed Python code execution (aggregation, computation, multi-document analysis)
Conversational RAG — Chat TUI and web application for multi-turn conversations with session memory
Document structure — Stores full DoclingDocument, enabling structure-aware context expansion
Multiple providers — Embeddings: Ollama, OpenAI, VoyageAI, Cohere, LM Studio, vLLM (multimodal via multimodal: true on vLLM/VoyageAI/Cohere). QA: any model supported by Pydantic AI
Local-first — Embedded LanceDB, no servers required. Also supports S3, GCS, Azure, and LanceDB Cloud
CLI & Python API — Full functionality from command line or code
MCP server — Expose as tools for AI assistants (Claude Desktop, etc.)
Visual grounding — View chunks highlighted on original page images
Production ingester — Long-lived haiku-ingester service with persistent SQLite queue, async worker pool with retries and a dead-letter queue, FS / HTTP / S3 / WebDAV source adapters, FastAPI control plane, and a browser dashboard for operators. See docs/ingester.md.
Tags — Name database states with haiku-rag tag and roll back to them
Inspector — TUI for browsing documents, chunks, and search results

Installation

Python 3.12 or newer required

Full Package (Recommended)

pip install haiku.rag

Includes all features: document processing, all embedding providers, and rerankers.

Using uv? uv pip install haiku.rag

Slim Package (Minimal Dependencies)

pip install haiku.rag-slim

Install only the extras you need. See the Installation documentation for available options.

Quick Start

Note: Requires an embedding provider (Ollama, OpenAI, etc.). See the Tutorial for setup instructions.

# Index a PDF
haiku-rag add-src paper.pdf

# Search
haiku-rag search "attention mechanism"

# Ask questions with citations
haiku-rag ask "What datasets were used for evaluation?"

# Analyze — complex analytical tasks via code execution
haiku-rag analyze "How many documents mention transformers?"

# Interactive chat — multi-turn conversations with memory
haiku-rag chat

# Continuously ingest from configured sources (FS, HTTP, S3, WebDAV)
haiku-ingester serve

See Configuration for customization options.

Python API

from haiku.rag.client import HaikuRAG

async with HaikuRAG("knowledge.lancedb", create=True) as rag:
    # Index documents
    await rag.create_document_from_source("paper.pdf")
    await rag.create_document_from_source("https://arxiv.org/pdf/1706.03762")

    # Search — returns chunks with provenance
    results = await rag.search("self-attention")
    for result in results:
        print(f"{result.score:.2f} | p.{result.page_numbers} | {result.content[:100]}")

    # QA with citations
    answer, citations = await rag.ask("What is the complexity of self-attention?")
    print(answer)
    for cite in citations:
        print(f"  [{cite.chunk_id}] p.{cite.page_numbers}: {cite.content[:80]}")

For details on the skills the client wraps, see the Skills docs.

MCP Server

Use with AI assistants like Claude Desktop:

haiku-rag mcp --stdio

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "haiku-rag": {
      "command": "haiku-rag",
      "args": ["mcp", "--stdio"]
    }
  }
}

Provides tools for document management, search, QA, and analysis directly in your AI assistant.

Examples

See the examples directory for working examples:

Docker Setup - Complete Docker deployment with continuous ingestion (haiku-ingester) and MCP server
Web Application - Full-stack conversational RAG with CopilotKit frontend

Documentation

Full documentation at: https://ggozad.github.io/haiku.rag/

Quickstart - Provider setup and first ingestion
Installation - Packages and extras
Configuration - YAML reference
CLI - Command reference
Python API - Complete API docs
Skills - The RAG and analysis skills the client wraps
Tuning - Retrieval and answer-quality tuning
Ingester - Production ingester for continuous indexing from FS, HTTP, S3, and WebDAV
MCP - Model Context Protocol integration
Remote processing - Offload conversion to docling-serve
Applications - Chat TUI, web app, and inspector
Benchmarks - Performance benchmarks
Changelog - Version history

License

This project is licensed under the MIT License.

mcp-name: io.github.ggozad/haiku-rag

haiku.rag

What is haiku.rag?

How to use haiku.rag?

Key Features

Optimized Use Cases

haiku.rag FAQ

Is haiku.rag safe?

Is haiku.rag up to date?

Are there any limits for haiku.rag?

Official Documentation