MCPHub LabRegistryweb-infra-dev/midscene
web-infra-dev

web infra dev/midscene

Built by web-infra-dev โ€ข 12,408 stars

What is web infra dev/midscene?

AI-powered, vision-driven UI automation for every platform.

How to use web infra dev/midscene?

1. Install a compatible MCP client (like Claude Desktop). 2. Open your configuration settings. 3. Add web infra dev/midscene using the following command: npx @modelcontextprotocol/web-infra-dev-midscene 4. Restart the client and verify the new tools are active.
๐Ÿ›ก๏ธ Scoped (Restricted)
npx @modelcontextprotocol/web-infra-dev-midscene --scope restricted
๐Ÿ”“ Unrestricted Access
npx @modelcontextprotocol/web-infra-dev-midscene

Key Features

Native MCP Protocol Support
Real-time Tool Activation & Execution
Verified High-performance Implementation
Secure Resource & Context Handling

Optimized Use Cases

Extending AI models with custom local capabilities
Automating system workflows via natural language
Connecting external data sources to LLM context windows

web infra dev/midscene FAQ

Q

Is web infra dev/midscene safe?

Yes, web infra dev/midscene follows the standardized Model Context Protocol security patterns and only executes tools with explicit user-granted permissions.

Q

Is web infra dev/midscene up to date?

web infra dev/midscene is currently active in the registry with 12,408 stars on GitHub, indicating its reliability and community support.

Q

Are there any limits for web infra dev/midscene?

Usage limits depend on the specific implementation of the MCP server and your system resources. Refer to the official documentation below for technical details.

Official Documentation

View on GitHub
<p align="center"> <img alt="Midscene.js" width="260" src="https://github.com/user-attachments/assets/f60de3c1-dd6f-4213-97a1-85bf7c6e79e4"> </p> <h1 align="center">Midscene.js</h1> <div align="center">

English | ็ฎ€ไฝ“ไธญๆ–‡

<strong>Official Website</strong>: <a href="https://midscenejs.com/">https://midscenejs.com/</a>

<a href="https://trendshift.io/repositories/12524" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12524" alt="web-infra-dev%2Fmidscene | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>

</div> <p align="center"> AI-powered, vision-driven UI automation for every platform. </p> <p align="center"> <a href="https://www.npmjs.com/package/@midscene/web"><img src="https://img.shields.io/npm/v/@midscene/web?style=flat-square&color=00a8f0" alt="npm version" /></a> <a href="https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B"><img src="https://img.shields.io/badge/UI%20TARS%20Models-yellow" alt="hugging face model" /></a> <a href="https://npm-compare.com/@midscene/web/#timeRange=THREE_YEARS"><img src="https://img.shields.io/npm/dm/@midscene/web.svg?style=flat-square&color=00a8f0" alt="downloads" /></a> <a href="https://github.com/web-infra-dev/midscene/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square&color=00a8f0" alt="License" /> <a href="https://discord.gg/2JyBHxszE4"><img src="https://img.shields.io/discord/1328277792730779648?style=flat-square&color=7289DA&label=Discord&logo=discord&logoColor=white" alt="discord" /></a> <a href="https://x.com/midscene_ai"><img src="https://img.shields.io/twitter/follow/midscene_ai?style=flat-square" alt="twitter" /></a> <a href="https://deepwiki.com/web-infra-dev/midscene"> <img alt="Ask DeepWiki.com" src="https://devin.ai/assets/deepwiki-badge.png" style="height: 18px; vertical-align: middle;" /> </a> </p>

๐Ÿ“ฃ Midscene Skills is here!

Use Midscene Skills to control any platform with OpenClaw

Showcases

๐Ÿ’ก Features

Write Automation with Natural Language

  • Describe your goals and steps, and Midscene will plan and operate the user interface for you.
  • Use Javascript SDK or YAML to write your automation script.

Web & Mobile App & Any Interface

  • Web Automation: Either integrate with Puppeteer, Playwright or use Bridge Mode to control your desktop browser.
  • Android Automation: Use Javascript SDK with adb to control your local Android device.
  • iOS Automation: Use Javascript SDK with WebDriverAgent to control your local iOS devices and simulators.
  • Any Interface Automation: Use Javascript SDK to control your own interface.

For Developers

  • Three kinds of APIs:
  • MCP: Midscene provides MCP services that expose atomic Midscene Agent actions as MCP tools so upper-layer agents can inspect and operate UIs with natural language. Docs
  • Caching for Efficiency: Replay your script with cache and get the result faster.
  • Debugging Experience: Midscene.js offers a visualized replay back report file, a built-in playground, and a Chrome Extension to simplify the debugging process. These are the tools most developers truly need.

๐Ÿ‘‰ Zero-code Quick Experience

โœจ Driven by Visual Language Model

Midscene.js is all-in on the pure-vision route for UI actions: element localization and interactions are based on screenshots only. It supports visual-language models like Qwen3-VL, Doubao-1.6-vision, gemini-3-pro, and UI-TARS. For data extraction and page understanding, you can still opt in to include DOM when needed.

  • Pure-vision localization for UI actions; the DOM extraction mode is removed.
  • Works across web, mobile, desktop, and even <canvas> surfaces.
  • Far fewer tokens by skipping DOM for actions, which cuts cost and speeds up runs.
  • DOM can still be included for data extraction and page understanding when needed.
  • Strong open-source options for self-hosting.

Read more about Model Strategy

๐Ÿ“„ Resources

๐Ÿค Community

๐ŸŒŸ Awesome Midscene

Community projects that extend Midscene.js capabilities:

๐Ÿ“ Credits

We would like to thank the following projects:

  • Rsbuild and Rslib for the build tool.
  • UI-TARS for the open-source agent model UI-TARS.
  • Qwen-VL for the open-source VL model Qwen-VL.
  • scrcpy and yume-chan allow us to control Android devices with browser.
  • appium-adb for the javascript bridge of adb.
  • appium-webdriveragent for the javascript operate XCTestใ€‚
  • YADB for the yadb tool which improves the performance of text input.
  • libnut-core for the cross-platform native keyboard and mouse control.
  • Puppeteer for browser automation and control.
  • Playwright for browser automation and control and testing.

๐Ÿ“– Citation

If you use Midscene.js in your research or project, please cite:

@software{Midscene.js,
  author = {Xiao Zhou, Tao Yu, YiBing Lin},
  title = {Midscene.js: Your AI Operator for Web, Android, iOS, Automation & Testing.},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/web-infra-dev/midscene}
}

โœจ Star History

Star History Chart

๐Ÿ“ License

Midscene.js is MIT licensed.


<div align="center"> If this project helps you or inspires you, please give us a star </div>

Global Ranking

8.5
Trust ScoreMCPHub Index

Based on codebase health & activity.

Manual Config

{ "mcpServers": { "web-infra-dev-midscene": { "command": "npx", "args": ["web-infra-dev-midscene"] } } }