MCPHub LabRegistryspider-rs/spider
spider-rs

spider rs/spider

Built by spider-rs 2,370 stars

What is spider rs/spider?

Web crawler and scraper for Rust

How to use spider rs/spider?

1. Install a compatible MCP client (like Claude Desktop). 2. Open your configuration settings. 3. Add spider rs/spider using the following command: npx @modelcontextprotocol/spider-rs-spider 4. Restart the client and verify the new tools are active.
🛡️ Scoped (Restricted)
npx @modelcontextprotocol/spider-rs-spider --scope restricted
🔓 Unrestricted Access
npx @modelcontextprotocol/spider-rs-spider

Key Features

Native MCP Protocol Support
Real-time Tool Activation & Execution
Verified High-performance Implementation
Secure Resource & Context Handling

Optimized Use Cases

Extending AI models with custom local capabilities
Automating system workflows via natural language
Connecting external data sources to LLM context windows

spider rs/spider FAQ

Q

Is spider rs/spider safe?

Yes, spider rs/spider follows the standardized Model Context Protocol security patterns and only executes tools with explicit user-granted permissions.

Q

Is spider rs/spider up to date?

spider rs/spider is currently active in the registry with 2,370 stars on GitHub, indicating its reliability and community support.

Q

Are there any limits for spider rs/spider?

Usage limits depend on the specific implementation of the MCP server and your system resources. Refer to the official documentation below for technical details.

Official Documentation

View on GitHub

Spider

Build Status Crates.io Downloads Documentation License: MIT Discord

Website | Guides | API Docs | Examples | Discord

A high-performance web crawler and scraper for Rust. 200-1000x faster than popular alternatives, with HTTP, headless Chrome, and WebDriver rendering in a single library.

Quick Start

Command Line

cargo install spider_cli
spider --url https://example.com

Rust

[dependencies]
spider = "2"
use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com");
    website.crawl().await;
    println!("Pages found: {}", website.get_links().len());
}

Streaming

Process each page the moment it's crawled, not after:

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com");
    let mut rx = website.subscribe(0).unwrap();

    tokio::spawn(async move {
        while let Ok(page) = rx.recv().await {
            println!("- {}", page.get_url());
        }
    });

    website.crawl().await;
    website.unsubscribe();
}

Headless Chrome

Add one feature flag to render JavaScript-heavy pages:

[dependencies]
spider = { version = "2", features = ["chrome"] }
use spider::features::chrome_common::RequestInterceptConfiguration;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com")
        .with_chrome_intercept(RequestInterceptConfiguration::new(true))
        .with_stealth(true)
        .build()
        .unwrap();

    website.crawl().await;
}

Also supports WebDriver (Selenium Grid, remote browsers) and AI-driven automation. See examples for more.

Benchmarks

Crawling 185 pages (source, 10 samples averaged):

Apple M1 Max (10-core, 64 GB RAM):

CrawlerLanguageTimevs Spider
spiderRust73 msbaseline
node-crawlerJavaScript15 s205x slower
collyGo32 s438x slower
wgetC70 s959x slower

Linux (2-core, 7 GB RAM):

CrawlerLanguageTimevs Spider
spiderRust50 msbaseline
node-crawlerJavaScript3.4 s68x slower
collyGo30 s600x slower
wgetC60 s1200x slower

The gap grows with site size. Spider handles 100k+ pages in minutes where other crawlers take hours. This comes from Rust's async runtime (tokio), lock-free data structures, and optional io_uring on Linux. Full details

Why Spider?

Most crawlers force a choice between fast HTTP-only or slow-but-flexible browser automation. Spider supports both, and you can mix them in the same crawl.

Supports HTTP, Chrome, and WebDriver. Switch rendering modes with a feature flag. Use HTTP for speed, Chrome CDP for JavaScript-heavy pages, and WebDriver for Selenium Grid or cross-browser testing.

Built for production. Caching (memory, disk, hybrid), proxy rotation, anti-bot fingerprinting, ad blocking, depth budgets, cron scheduling, and distributed workers. All of this has been hardened through Spider Cloud.

AI automation included. spider_agent adds multimodal LLM-driven automation: navigate pages, fill forms, solve challenges, and extract structured data with OpenAI or any compatible API.

Features

<details> <summary><strong>Crawling</strong></summary>
  • Concurrent and streaming crawls with backpressure
  • Decentralized crawling for horizontal scaling
  • Caching: memory, disk (SQLite), or hybrid Chrome cache
  • Proxy support with rotation
  • Cron job scheduling
  • Depth budgeting, blacklisting, whitelisting
  • Smart mode that auto-detects JS-rendered content and upgrades to Chrome
</details> <details> <summary><strong>Browser Automation</strong></summary> </details> <details> <summary><strong>Data Processing</strong></summary> </details> <details> <summary><strong>AI Agent</strong></summary>
  • spider_agent: concurrent-safe multimodal web automation agent
  • Multiple LLM providers (OpenAI, any OpenAI-compatible API, Chrome built-in AI)
  • Web research with search providers (Serper, Brave, Bing, Tavily)
  • 110 built-in automation skills for web challenges
</details>

Spider Cloud

For managed proxy rotation, anti-bot bypass, and CAPTCHA handling, Spider Cloud plugs in with one line:

let mut website = Website::new("https://protected-site.com")
    .with_spider_cloud("your-api-key")  // enable with features = ["spider_cloud"]
    .build()
    .unwrap();
ModeStrategyBest For
Proxy (default)All traffic through Spider Cloud proxyGeneral crawling with IP rotation
Smart (recommended)Proxy + auto-fallback on bot detectionProduction (speed + reliability)
FallbackDirect first, API on failureCost-efficient, most sites work without help
UnblockerAll requests through unblockerAggressive bot protection

Free credits on signup. Get started at spider.cloud

Spider Browser Cloud

Connect to a remote Rust-based browser via CDP over WebSocket for automation, scraping, and AI extraction:

use spider::configuration::SpiderBrowserConfig;

// Simple — just an API key
let mut website = Website::new("https://example.com")
    .with_spider_browser("your-api-key")  // features = ["spider_cloud", "chrome"]
    .build()
    .unwrap();

// Full config — stealth, country targeting, custom options
let browser_cfg = SpiderBrowserConfig::new("your-api-key")
    .with_stealth(true)
    .with_country("us");

let mut website = Website::new("https://example.com")
    .with_spider_browser_config(browser_cfg)
    .build()
    .unwrap();

WebSocket endpoint: wss://browser.spider.cloud/v1/browser — supports CDP and WebDriver BiDi protocols.

Parallel Backends (LightPanda / Servo)

Race alternative browser engines alongside the primary crawl. The best HTML response wins — higher reliability and coverage for JS-heavy pages.

use spider::configuration::{BackendEndpoint, BackendEngine, ParallelBackendsConfig};

let mut website = Website::new("https://example.com");

// Race a remote LightPanda instance alongside the primary crawl.
website.configuration.parallel_backends = Some(ParallelBackendsConfig {
    backends: vec![BackendEndpoint {
        engine: BackendEngine::LightPanda,
        endpoint: Some("ws://127.0.0.1:9222".to_string()),
        binary_path: None,
        protocol: None,
        proxy: None, // inherits from website proxies config
    }],
    grace_period_ms: 500,       // wait up to 500ms for a better result
    fast_accept_threshold: 80,  // accept immediately if quality >= 80
    ..Default::default()
});

website.crawl().await;

Features: lightpanda (LightPanda via CDP), servo (Servo via WebDriver), parallel_backends_full (both).

Lock-free, zero overhead when disabled, automatic backend health tracking with auto-disable after consecutive failures.

Get Spider

PackageLanguageInstall
spiderRustcargo add spider
spider_cliCLIcargo install spider_cli
spider-nodejsNode.jsnpm i @spider-rs/spider-rs
spider-pyPythonpip install spider_rs
spider_agentRustcargo add spider --features agent
spider_mcpMCPcargo install spider_mcp

MCP Server

Use Spider as tools in Claude Code, Claude Desktop, or any MCP client:

cargo install spider_mcp
{ "mcpServers": { "spider": { "command": "spider-mcp" } } }

Then ask: "Scrape https://example.com as markdown" or "Crawl https://example.com up to 5 pages"

Cloud and Remote

PackageDescription
Spider CloudManaged crawling infrastructure, no setup needed
spider-clientsSDKs for Spider Cloud in multiple languages
spider-browserRemote access to Spider's Rust browser

Resources

Contributing

Contributions welcome. See CONTRIBUTING.md for setup and guidelines.

Spider has been actively developed for the past 4 years. Join the Discord for questions and discussion.

License

MIT

Global Ranking

8.5
Trust ScoreMCPHub Index

Based on codebase health & activity.

Manual Config

{ "mcpServers": { "spider-rs-spider": { "command": "npx", "args": ["spider-rs-spider"] } } }