ggozad

haiku.rag

Built by ggozad 500 stars

What is haiku.rag?

Opinionated agentic RAG powered by LanceDB, Pydantic AI, and Docling

How to use haiku.rag?

1. Install a compatible MCP client (like Claude Desktop). 2. Open your configuration settings. 3. Add haiku.rag using the following command: npx @modelcontextprotocol/haiku-rag 4. Restart the client and verify the new tools are active.
🛡️ Scoped (Restricted)
npx @modelcontextprotocol/haiku-rag --scope restricted
🔓 Unrestricted Access
npx @modelcontextprotocol/haiku-rag

Key Features

Native MCP Protocol Support
Real-time Tool Activation & Execution
Verified High-performance Implementation
Secure Resource & Context Handling

Optimized Use Cases

Extending AI models with custom local capabilities
Automating system workflows via natural language
Connecting external data sources to LLM context windows

haiku.rag FAQ

Q

Is haiku.rag safe?

Yes, haiku.rag follows the standardized Model Context Protocol security patterns and only executes tools with explicit user-granted permissions.

Q

Is haiku.rag up to date?

haiku.rag is currently active in the registry with 500 stars on GitHub, indicating its reliability and community support.

Q

Are there any limits for haiku.rag?

Usage limits depend on the specific implementation of the MCP server and your system resources. Refer to the official documentation below for technical details.

Official Documentation

View on GitHub

Haiku RAG

Tests codecov

Agentic RAG built on LanceDB, Pydantic AI, and Docling.

Features

  • Hybrid search — Vector + full-text with Reciprocal Rank Fusion
  • Question answering — QA agents with citations (page numbers, section headings)
  • Reranking — MxBAI, Cohere, Zero Entropy, or vLLM
  • Research agents — Multi-agent workflows via pydantic-graph: plan, search, evaluate, synthesize
  • RLM agent — Complex analytical tasks via sandboxed Python code execution (aggregation, computation, multi-document analysis)
  • Conversational RAG — Chat TUI and web application for multi-turn conversations with session memory
  • Document structure — Stores full DoclingDocument, enabling structure-aware context expansion
  • Multiple providers — Embeddings: Ollama, OpenAI, VoyageAI, LM Studio, vLLM. QA/Research: any model supported by Pydantic AI
  • Local-first — Embedded LanceDB, no servers required. Also supports S3, GCS, Azure, and LanceDB Cloud
  • CLI & Python API — Full functionality from command line or code
  • MCP server — Expose as tools for AI assistants (Claude Desktop, etc.)
  • Visual grounding — View chunks highlighted on original page images
  • File monitoring — Watch directories and auto-index on changes
  • Time travel — Query the database at any historical point with --before
  • Inspector — TUI for browsing documents, chunks, and search results

Installation

Python 3.12 or newer required

Full Package (Recommended)

pip install haiku.rag

Includes all features: document processing, all embedding providers, and rerankers.

Using uv? uv pip install haiku.rag

Slim Package (Minimal Dependencies)

pip install haiku.rag-slim

Install only the extras you need. See the Installation documentation for available options.

Quick Start

Note: Requires an embedding provider (Ollama, OpenAI, etc.). See the Tutorial for setup instructions.

# Index a PDF
haiku-rag add-src paper.pdf

# Search
haiku-rag search "attention mechanism"

# Ask questions with citations
haiku-rag ask "What datasets were used for evaluation?" --cite

# Research mode — iterative planning and search
haiku-rag research "What are the limitations of the approach?"

# RLM mode — complex analytical tasks via code execution
haiku-rag rlm "How many documents mention transformers?"

# Interactive chat — multi-turn conversations with memory
haiku-rag chat

# Watch a directory for changes
haiku-rag serve --monitor

See Configuration for customization options.

Python API

from haiku.rag.client import HaikuRAG

async with HaikuRAG("research.lancedb", create=True) as rag:
    # Index documents
    await rag.create_document_from_source("paper.pdf")
    await rag.create_document_from_source("https://arxiv.org/pdf/1706.03762")

    # Search — returns chunks with provenance
    results = await rag.search("self-attention")
    for result in results:
        print(f"{result.score:.2f} | p.{result.page_numbers} | {result.content[:100]}")

    # QA with citations
    answer, citations = await rag.ask("What is the complexity of self-attention?")
    print(answer)
    for cite in citations:
        print(f"  [{cite.chunk_id}] p.{cite.page_numbers}: {cite.content[:80]}")

For research agents and chat, see the Agents docs.

MCP Server

Use with AI assistants like Claude Desktop:

haiku-rag serve --mcp --stdio

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "haiku-rag": {
      "command": "haiku-rag",
      "args": ["serve", "--mcp", "--stdio"]
    }
  }
}

Provides tools for document management, search, QA, and research directly in your AI assistant.

Examples

See the examples directory for working examples:

  • Docker Setup - Complete Docker deployment with file monitoring and MCP server
  • Web Application - Full-stack conversational RAG with CopilotKit frontend

Documentation

Full documentation at: https://ggozad.github.io/haiku.rag/

License

This project is licensed under the MIT License.

<!-- mcp-name is used by the MCP registry to identify this server -->

mcp-name: io.github.ggozad/haiku-rag

Global Ranking

-
Trust ScoreMCPHub Index

Based on codebase health & activity.

Manual Config

{ "mcpServers": { "haiku-rag": { "command": "npx", "args": ["haiku-rag"] } } }