MCPHub LabRegistrykreuzberg-dev/kreuzberg
kreuzberg-dev

kreuzberg dev/kreuzberg

Built by kreuzberg-dev 7,139 stars

What is kreuzberg dev/kreuzberg?

A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 88+ formats. Available for Rust, Python, Ruby,

How to use kreuzberg dev/kreuzberg?

1. Install a compatible MCP client (like Claude Desktop). 2. Open your configuration settings. 3. Add kreuzberg dev/kreuzberg using the following command: npx @modelcontextprotocol/kreuzberg-dev-kreuzberg 4. Restart the client and verify the new tools are active.
🛡️ Scoped (Restricted)
npx @modelcontextprotocol/kreuzberg-dev-kreuzberg --scope restricted
🔓 Unrestricted Access
npx @modelcontextprotocol/kreuzberg-dev-kreuzberg

Key Features

Native MCP Protocol Support
Real-time Tool Activation & Execution
Verified High-performance Implementation
Secure Resource & Context Handling

Optimized Use Cases

Extending AI models with custom local capabilities
Automating system workflows via natural language
Connecting external data sources to LLM context windows

kreuzberg dev/kreuzberg FAQ

Q

Is kreuzberg dev/kreuzberg safe?

Yes, kreuzberg dev/kreuzberg follows the standardized Model Context Protocol security patterns and only executes tools with explicit user-granted permissions.

Q

Is kreuzberg dev/kreuzberg up to date?

kreuzberg dev/kreuzberg is currently active in the registry with 7,139 stars on GitHub, indicating its reliability and community support.

Q

Are there any limits for kreuzberg dev/kreuzberg?

Usage limits depend on the specific implementation of the MCP server and your system resources. Refer to the official documentation below for technical details.

Official Documentation

View on GitHub

Kreuzberg

<div align="center" style="display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin: 20px 0;"> <!-- Language Bindings --> <a href="https://crates.io/crates/kreuzberg"> <img src="https://img.shields.io/crates/v/kreuzberg?label=Rust&color=007ec6" alt="Rust"> </a> <a href="https://hex.pm/packages/kreuzberg"> <img src="https://img.shields.io/hexpm/v/kreuzberg?label=Elixir&color=007ec6" alt="Elixir"> </a> <a href="https://pypi.org/project/kreuzberg/"> <img src="https://img.shields.io/pypi/v/kreuzberg?label=Python&color=007ec6" alt="Python"> </a> <a href="https://www.npmjs.com/package/@kreuzberg/node"> <img src="https://img.shields.io/npm/v/@kreuzberg/node?label=Node.js&color=007ec6" alt="Node.js"> </a> <a href="https://www.npmjs.com/package/@kreuzberg/wasm"> <img src="https://img.shields.io/npm/v/@kreuzberg/wasm?label=WASM&color=007ec6" alt="WASM"> </a> <a href="https://central.sonatype.com/artifact/dev.kreuzberg/kreuzberg"> <img src="https://img.shields.io/maven-central/v/dev.kreuzberg/kreuzberg?label=Java&color=007ec6" alt="Java"> </a> <a href="https://github.com/kreuzberg-dev/kreuzberg/releases"> <img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v4.7.0" alt="Go"> </a> <a href="https://www.nuget.org/packages/Kreuzberg/"> <img src="https://img.shields.io/nuget/v/Kreuzberg?label=C%23&color=007ec6" alt="C#"> </a> <a href="https://packagist.org/packages/kreuzberg/kreuzberg"> <img src="https://img.shields.io/packagist/v/kreuzberg/kreuzberg?label=PHP&color=007ec6" alt="PHP"> </a> <a href="https://rubygems.org/gems/kreuzberg"> <img src="https://img.shields.io/gem/v/kreuzberg?label=Ruby&color=007ec6" alt="Ruby"> </a> <a href="https://kreuzberg-dev.r-universe.dev/kreuzberg"> <img src="https://img.shields.io/badge/R-kreuzberg-007ec6" alt="R"> </a> <a href="https://github.com/kreuzberg-dev/kreuzberg/pkgs/container/kreuzberg"> <img src="https://img.shields.io/badge/Docker-007ec6?logo=docker&logoColor=white" alt="Docker"> </a> <a href="https://github.com/kreuzberg-dev/kreuzberg/releases"> <img src="https://img.shields.io/badge/C-FFI-007ec6" alt="C"> </a> <!-- Project Info --> <a href="https://github.com/kreuzberg-dev/kreuzberg/blob/main/LICENSE"> <img src="https://img.shields.io/badge/License-MIT-007ec6" alt="License"> </a> <a href="https://docs.kreuzberg.dev"> <img src="https://img.shields.io/badge/docs-kreuzberg.dev-007ec6" alt="Documentation"> </a> <a href="https://docs.kreuzberg.dev/demo.html"> <img src="https://img.shields.io/badge/%E2%96%B6%EF%B8%8F_Live_Demo-007ec6" alt="Live Demo"> </a> <a href="https://huggingface.co/Kreuzberg"> <img src="https://img.shields.io/badge/%F0%9F%A4%97_Hugging_Face-007ec6" alt="Hugging Face"> </a> </div> <img width="3384" height="573" alt="Linkedin- Banner" src="https://github.com/user-attachments/assets/1b6c6ad7-3b6d-4171-b1c9-f2026cc9deb8" /> <div align="center" style="margin-top: 20px;"> <a href="https://discord.gg/xt9WY3GnKR"> <img height="22" src="https://img.shields.io/badge/Discord-Join%20our%20community-7289da?logo=discord&logoColor=white" alt="Discord"> </a> </div>

Extract text, metadata, and code intelligence from 91+ file formats and 248 programming languages at native speeds without needing a GPU.

Key Features

  • Code intelligence – Extract functions, classes, imports, symbols, and docstrings from 248 programming languages via tree-sitter
  • Extensible architecture – Plugin system for custom OCR backends, validators, post-processors, document extractors, and renderers
  • Polyglot – Native bindings for Rust, Python, TypeScript/Node.js, Ruby, Go, Java, C#, PHP, Elixir, R, and C
  • 91+ file formats – PDF, Office documents, images, HTML, XML, emails, archives, academic formats across 8 categories
  • OCR support – Tesseract (all bindings, including Tesseract-WASM for browsers), PaddleOCR (all native bindings), EasyOCR (Python), extensible via plugin API
  • High performance – Rust core with native PDFium, SIMD optimizations and full parallelism
  • Flexible deployment – Use as library, CLI tool, REST API server, or MCP server
  • TOON wire format – Token-efficient serialization for LLM/RAG pipelines, ~30-50% fewer tokens than JSON
  • Memory efficient – Streaming parsers for multi-GB files

Complete Documentation | Live Demo | Installation Guides

Installation

Each language binding provides comprehensive documentation with examples and best practices. Choose your platform to get started:

Scripting Languages:

  • Python – PyPI package, async/sync APIs, OCR backends (Tesseract, PaddleOCR, EasyOCR)
  • Ruby – RubyGems package, idiomatic Ruby API, native bindings
  • PHP – Composer package, modern PHP 8.4+ support, type-safe API, async extraction
  • Elixir – Hex package, OTP integration, concurrent processing
  • R – r-universe package, idiomatic R API, extendr bindings

JavaScript/TypeScript:

  • @kreuzberg/node – Native NAPI-RS bindings for Node.js/Bun, fastest performance
  • @kreuzberg/wasm – WebAssembly for browsers/Deno/Cloudflare Workers, full feature parity (PDF, Excel, OCR, archives)

Compiled Languages:

  • Go – Go module with FFI bindings, context-aware async
  • Java – Maven Central, Foreign Function & Memory API
  • C# – NuGet package, .NET 6.0+, full async/await support

Native:

  • Rust – Core library, flexible feature flags, zero-copy APIs
  • C (FFI) – C header + shared library, pkg-config/CMake support, cross-platform

Containers:

  • Docker – Official images with API, CLI, and MCP server modes (Core: ~1.0-1.3GB, Full: ~1.0-1.3GB with OCR + legacy format support)

Command-Line:

  • CLI – Cross-platform binary, batch processing, MCP server mode

All language bindings include precompiled binaries for both x86_64 and aarch64 architectures on Linux and macOS.

Platform Support

Complete architecture coverage across all language bindings:

LanguageLinux x86_64Linux aarch64macOS ARM64Windows x64
Python
Node.js
WASM
Ruby-
R
Elixir
Go
Java
C#
PHP
Rust
C (FFI)
CLI
Docker-

Note: ✅ = Precompiled binaries available with instant installation. WASM runs in any environment with WebAssembly support (browsers, Deno, Bun, Cloudflare Workers). All platforms are tested in CI. macOS support is Apple Silicon only.

Embeddings Support (Optional)

To use embeddings functionality:

  1. Install ONNX Runtime 1.24+:

  2. Use embeddings in your code - see Embeddings Guide

Note: Kreuzberg requires ONNX Runtime version 1.24+ for embeddings. All other Kreuzberg features work without ONNX Runtime.

Supported Formats

91+ file formats across 8 major categories with intelligent format detection and comprehensive metadata extraction.

Office Documents

CategoryFormatsCapabilities
Word Processing.docx, .docm, .dotx, .dotm, .dot, .odt, .pagesFull text, tables, lists, images, metadata, styles
Spreadsheets.xlsx, .xlsm, .xlsb, .xls, .xla, .xlam, .xltm, .xltx, .xlt, .ods, .numbersSheet data, formulas, cell metadata, charts
Presentations.pptx, .pptm, .ppsx, .potx, .potm, .pot, .keySlides, speaker notes, images, metadata
PDF.pdfText, tables, images, metadata, OCR support
eBooks.epub, .fb2Chapters, metadata, embedded resources
Database.dbfTable data extraction, field type support
Hangul.hwp, .hwpxKorean document format, text extraction

Images (OCR-Enabled)

CategoryFormatsFeatures
Raster.png, .jpg, .jpeg, .gif, .webp, .bmp, .tiff, .tifOCR, table detection, EXIF metadata, dimensions, color space
Advanced.jp2, .jpx, .jpm, .mj2, .jbig2, .jb2, .pnm, .pbm, .pgm, .ppmPure Rust decoders (JPEG 2000, JBIG2), OCR, table detection
Vector.svgDOM parsing, embedded text, graphics metadata

Web & Data

CategoryFormatsFeatures
Markup.html, .htm, .xhtml, .xml, .svgDOM parsing, metadata (Open Graph, Twitter Card), link extraction
Structured Data.json, .yaml, .yml, .toml, .csv, .tsvSchema detection, nested structures, validation
Text & Markdown.txt, .md, .markdown, .djot, .mdx, .rst, .org, .rtfCommonMark, GFM, Djot, MDX, reStructuredText, Org Mode, Rich Text

Email & Archives

CategoryFormatsFeatures
Email.eml, .msgHeaders, body (HTML/plain), attachments, UTF-16 support
Archives.zip, .tar, .tgz, .gz, .7zRecursive extraction, nested archives, metadata

Academic & Scientific

CategoryFormatsFeatures
Citations.bib, .ris, .nbib, .enw, .cslBibTeX/BibLaTeX, RIS, PubMed/MEDLINE, EndNote XML, CSL JSON
Scientific.tex, .latex, .typ, .typst, .jats, .ipynbLaTeX, Typst, JATS journal articles, Jupyter notebooks
Publishing.fb2, .docbook, .dbk, .opmlFictionBook, DocBook XML, OPML outlines
Documentation.pod, .mdoc, .troffPerl POD, man pages, troff

Complete Format Reference →

Code Intelligence (248 Languages)

FeatureDescription
Structure ExtractionFunctions, classes, methods, structs, interfaces, enums
Import/Export AnalysisModule dependencies, re-exports, wildcard imports
Symbol ExtractionVariables, constants, type aliases, properties
Docstring ParsingGoogle, NumPy, Sphinx, JSDoc, RustDoc, and 10+ formats
DiagnosticsParse errors with line/column positions
Syntax-Aware ChunkingSplit code by semantic boundaries, not arbitrary byte offsets

Powered by tree-sitter-language-pack with dynamic grammar download. See TSLP documentation for the full language list.

Key Features

<details> <summary><strong>OCR with Table Extraction</strong></summary>

Multiple OCR backends (Tesseract, EasyOCR, PaddleOCR) with intelligent table detection and reconstruction. Extract structured data from scanned documents and images with configurable accuracy thresholds.

OCR Backend Documentation →

</details> <details> <summary><strong>Batch Processing</strong></summary>

Process multiple documents concurrently with configurable parallelism. Optimize throughput for large-scale document processing workloads with automatic resource management.

Batch Processing Guide →

</details> <details> <summary><strong>Password-Protected PDFs</strong></summary>

Handle encrypted PDFs with single or multiple password attempts. Supports both RC4 and AES encryption with automatic fallback strategies.

PDF Configuration →

</details> <details> <summary><strong>Language Detection</strong></summary>

Automatic language detection in extracted text using fast-langdetect. Configure confidence thresholds and access per-language statistics.

Language Detection Guide →

</details> <details> <summary><strong>Metadata Extraction</strong></summary>

Extract comprehensive metadata from all supported formats: authors, titles, creation dates, page counts, EXIF data, and format-specific properties.

Metadata Guide →

</details>

AI Coding Assistants

Kreuzberg ships with an Agent Skill that teaches AI coding assistants how to use the library correctly. It works with Claude Code, Codex, Gemini CLI, Cursor, VS Code, Amp, Goose, Roo Code, and any tool supporting the Agent Skills standard.

Install the skill into any project using the Vercel Skills CLI:

npx skills add kreuzberg-dev/kreuzberg

The skill is located at skills/kreuzberg/SKILL.md and is automatically discovered by supported AI coding tools once installed.

Documentation

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details. You can use Kreuzberg freely in both commercial and closed-source products with no obligations, no viral effects, and no licensing restrictions.

Global Ranking

8.5
Trust ScoreMCPHub Index

Based on codebase health & activity.

Manual Config

{ "mcpServers": { "kreuzberg-dev-kreuzberg": { "command": "npx", "args": ["kreuzberg-dev-kreuzberg"] } } }