中文 | English
📖 Overview | 🎬 Use Cases | 🔧 Skills | 🤖 Models | ⚗️ Approach | 🌟 Technical Advantages | 📄 Citation | ❓ FAQ
Mano-P: "Mano" means "hand" in Spanish, and "P" has dual meanings: Person and Party. We believe that both individuals and organizations can create their own personalized AI, and a bright future of human-machine collaboration is on the horizon.
Mano-P is a GUI-VLA agent project designed specifically for edge devices. It serves both as an open-source project and a hardware product solution. As an open-source project, Mano-P is being released in a phased, progressive manner, targeting three distinct groups of developers. In the first phase, we will open-source the Mano-CUA Skills. This phase is aimed at Agent enthusiasts—such as users of OpenClaw or Claude Code—enabling them to leverage the capabilities of Mano-CUA Skills to construct more intelligent CUA task workflows and overcome the bottlenecks associated with human intervention. In the second phase, we will open-source the local-side models and SDK components of Mano-CUA. This phase targets developers with high security requirements, allowing them to directly utilize GUI-VLA models capable of running inference locally on a Mac mini to build their own custom Skills, Tools, and more; crucially, all your CUA operations will be executed entirely on your local Mac mini and will not be uploaded to external servers. In the third phase, we will open-source the training methodologies and the pruning and quantization techniques used for the Mano-P models. This phase is designed for developers with specific model training needs, empowering them to apply our training methods to create their own on-device GUI-VLA models tailored to their unique requirements.
Regarding our GUI-VLA models—which are capable of running inference directly on Mac mini and MacBook devices—we currently support two deployment methods: First, direct deployment on Mac mini or MacBook models equipped with an M4 chip and 32GB or more of RAM; and second, deployment utilizing a compute stick connected via a USB 4.0 port or higher. We will be releasing detailed instructions for both deployment methods in the near future, and we plan to expand our support to include additional deployment options in the future.
- Complex GUI Automation: Autonomously complete complex interface operations containing hundreds of interactive elements
- Cross-System Data Integration: Extract and integrate multi-source data through pure visual interaction without API interfaces
- Long-Task Planning Execution: Support enterprise-level business process automation of dozens to hundreds of steps
- Intelligent Report Generation: Automatically generate structured documents such as data analysis reports and work summaries
Mano-P builds upon the complete technical framework of the Mano project (see Mano Technical Report), employing the Mano-Action bidirectional self-reinforcement learning method, three-stage progressive training (SFT → Offline Reinforcement Learning → Online Reinforcement Learning), "think-act-verify" loop reasoning mechanism, and a closed-loop data circulation system to achieve high-precision GUI understanding and operation capabilities. The edge version is optimized through mixed-precision quantization, visual token pruning, and edge inference adaptation, enabling large-scale parameter models to run efficiently on edge devices like Mac mini/MacBook/computing sticks.
- #1 on OSWorld Benchmark: Mano-P 1.0-72B achieves 58.2% success rate on OSWorld, ranking first among all specialized GUI agent models, outperforming the second-place opencua-72b (45.0%) by 13.2 percentage points
- Leading on WebRetriever Protocol I: Mano-P 1.0 scores 41.7 NavEval, surpassing Gemini 2.5 Pro Computer Use (40.9) and Claude 4.5 Computer Use (31.3)
- Fully Local Execution: Runs inference locally on Apple M4 chip with 32GB RAM (Mac mini or MacBook). No cloud API calls required. All screenshots and task data stay on-device
- High-Performance Inference: The 4B quantized model (w4a16) achieves 476 tokens/s prefill and 76 tokens/s decode on Apple M4 Pro, with only 4.3GB peak memory usage
- Autonomous Long-Task Execution: Supports complex business processes with end-to-end automation without internet connectivity
Mano-P_AFK_EN.mp4
We demonstrated the fully automated application construction process of mano afk. After receiving natural language requirements, the system sequentially completes requirement clarification, technical architecture design, code generation, local deployment, and multi-level testing (API interface testing, LLM based page visual inspection, and end-to-end GUI automation testing driven by VLA model). When the test fails, the system automatically locates the root cause of the problem, fixes the code, and deploys verification again, iterating until all test cases pass. The entire process does not require manual intervention, and ultimately delivers a runnable application with complete requirement documents and build reports.
Mano-P.Commercial_video_intelligent_system_EN.mp4
We fully demonstrated the actual workflow of a commercial video intelligent system. Starting from the user's command, the system automatically completes the entire process of video generation, uploading, analysis, editing, and secondary evaluation. During the process, the system can autonomously operate web pages and editing software, complete fine operations such as file processing and subtitle modification, and generate analysis reports containing subjective evaluations and objective indicators. By comparing the differences between the initial and refined versions, visually present the overall capabilities and application effects of the system.
Mano-P._EN.mp4
Mano-P, The small-sized end side GUI-VLA model can run directly on your computer, supporting direct inference operation on Macmini/Macbook with M4 chip and above, as well as direct operation on plug and play computing power sticks. In the CUA scenario, break through the bottleneck of human participation in the Agent workflow. Mano-P, The first step in leading AI for Personal/Party.
Mano_._EN.mp4
Mano-P excels not only in enterprise-level business automation but also integrates seamlessly into daily life. This video demonstrates the system's application in Mahjong gameplay: through pure visual understanding of the game interface, it autonomously completes tile recognition, analysis, and decision-making. This case validates Mano-P's general-purpose capabilities beyond work scenarios—from office automation to leisure entertainment, from structured data processing to unstructured game interactions, truly realizing the vision of "AI for Personal." One model, adapting to every aspect of life and work.
Performance of the Mano series models in multiple benchmarks:
📊 Expand Evaluation Data
| Models | Protocol | CA | CV | PAR | Saliency | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | F1 | Acc | F1 | Acc | F1 | KL↓ | CC↑ | SIM↑ | NSS↑ | AUC↑ | ||
| Random | P1 | 10.42 | 11.03 | 10.76 | 10.95 | 15.94 | 16.00 | 2.1789 | 0.0452 | 0.2852 | 0.1081 | 0.5340 |
| P2 | 10.01 | 10.74 | 10.32 | 10.50 | 14.39 | 15.04 | 4.3378 | 0.0270 | 0.2274 | 0.0665 | 0.5273 | |
| Zero-shot for MLLMs | ||||||||||||
| GPT4o | P1 | 15.17 | 6.57 | 16.11 | 9.58 | 16.71 | 10.34 | 1.9423 | 0.4660 | 0.4602 | 1.2842 | 0.7848 |
| P2 | 10.26 | 4.77 | 12.16 | 7.66 | 15.00 | 8.55 | 2.2650 | 0.4097 | 0.4028 | 1.2418 | 0.7807 | |
| Gemini 2.0 Flash | P1 | 17.18 | 5.13 | 25.06 | 8.39 | 24.94 | 9.52 | 1.4726 | 0.3380 | 0.3751 | 0.8629 | 0.7296 |
| P2 | 10.45 | 4.26 | 12.60 | 4.95 | 15.96 | 7.90 | 1.6373 | 0.3542 | 0.3490 | 1.0027 | 0.7590 | |
| GPT-5.2 | P1 | 17.83 | 7.67 | 22.22 | 12.55 | 16.17 | 9.74 | 1.3262 | 0.4852 | 0.4632 | 1.3078 | 0.7969 |
| P2 | 15.31 | 5.14 | 19.88 | 10.27 | 13.56 | 7.42 | 1.5444 | 0.4379 | 0.4092 | 1.3006 | 0.7999 | |
| Claude Sonnet 4.5 | P1 | 10.34 | 5.8 | 13.26 | 9.84 | 16.02 | 9.94 | 1.4235 | 0.4912 | 0.4213 | 1.2956 | 0.8042 |
| P2 | 10.34 | 5.55 | 13.27 | 7.08 | 16.02 | 9.6 | 1.2855 | 0.4564 | 0.4781 | 1.3112 | 0.7915 | |
| Llama 4 Scout | P1 | 13.98 | 9.96 | 10.25 | 6.51 | 13.27 | 8.11 | 3.7166 | 0.3331 | 0.3849 | 0.8828 | 0.7238 |
| P2 | 10.00 | 7.33 | 11.10 | 8.49 | 14.35 | 7.42 | 3.7434 | 0.3019 | 0.3452 | 0.8848 | 0.7258 | |
| Qwen2.5-VL-7B | P1 | 15.88 | 5.21 | 10.07 | 6.07 | 12.26 | 4.96 | 12.0586 | 0.0999 | 0.2154 | 0.2578 | 0.5852 |
| P2 | 10.25 | 3.95 | 10.89 | 5.83 | 14.39 | 5.73 | 12.7596 | 0.0762 | 0.1855 | 0.2195 | 0.5753 | |
| InternVL3-8B | P1 | 13.35 | 7.78 | 14.71 | 8.02 | 10.20 | 6.95 | 12.6480 | 0.0572 | 0.1895 | 0.1140 | 0.5769 |
| P2 | 10.58 | 6.70 | 10.94 | 8.12 | 12.68 | 6.32 | 12.1385 | 0.0604 | 0.1819 | 0.1395 | 0.5859 | |
| Fine-tune for MLLMs | ||||||||||||
| Qwen2.5-VL-7B | P1 | 22.51 | 19.11 | 23.39 | 10.83 | 32.06 | 25.88 | 1.5091 | 0.6953 | 0.6118 | 1.8937 | 0.8579 |
| P2 | 13.72 | 13.25 | 13.03 | 10.94 | 21.24 | 20.65 | 2.2496 | 0.5359 | 0.4793 | 1.6439 | 0.8221 | |
| InternVL3-8B | P1 | 20.94 | 18.41 | 21.96 | 11.02 | 30.33 | 24.66 | 1.2551 | 0.7014 | 0.6340 | 1.9896 | 0.8670 |
| P2 | 12.81 | 11.83 | 12.16 | 11.11 | 19.26 | 19.27 | 1.8759 | 0.6282 | 0.5467 | 2.0621 | 0.8627 | |
| Mano-P 1.0 | ||||||||||||
| Stage I | P1 | 31.27 | 30.53 | 27.31 | 25.18 | 35.16 | 34.45 | 0.6794 | 0.7670 | 0.7015 | 2.1347 | 0.8710 |
| P2 | 21.89 | 22.06 | 18.27 | 18.57 | 23.77 | 23.87 | 1.5759 | 0.6482 | 0.6167 | 2.1021 | 0.8627 | |
| Stage II | P1 | 32.59 | 31.46 | 27.57 | 25.76 | 37.73 | 35.79 | 0.6736 | 0.7686 | 0.7120 | 2.1688 | 0.8853 |
| P2 | 20.55 | 21.26 | 15.37 | 15.15 | 25.36 | 25.83 | 0.5617 | 0.6440 | 0.6130 | 2.1090 | 0.8602 | |
| Stage III | P1 | 34.58 | 33.99 | 31.92 | 28.37 | 39.42 | 37.63 | 0.6073 | 0.7853 | 0.7248 | 2.2103 | 0.8938 |
| P2 | 25.29 | 25.83 | 20.21 | 19.29 | 26.49 | 26.54 | 1.4617 | 0.6725 | 0.6330 | 2.1788 | 0.8776 | |
| Dataset | Method | Saliency | ||||
|---|---|---|---|---|---|---|
| KL↓ | CC↑ | SIM↑ | NSS↑ | AUC↑ | ||
| MIT1003 | FastSal | 1.036 | 0.590 | 0.478 | 2.008 | 0.875 |
| SAM-Resnet | 1.247 | 0.746 | 0.597 | 2.752 | 0.902 | |
| DAV | 0.753 | 0.699 | 0.566 | 2.574 | 0.897 | |
| UNISAL | 1.014 | 0.734 | 0.597 | 2.759 | 0.902 | |
| Transalnet | 0.660 | 0.722 | 0.592 | 2.631 | 0.903 | |
| SUM | 0.563 | 0.768 | 0.630 | 2.839 | 0.913 | |
| Mano-P 1.0 | 0.648 | 0.770 | 0.698 | 2.950 | 0.902 | |
| SalECI | SSM | 0.720 | 0.599 | 0.611 | 1.396 | 0.830 |
| DeepGaze IIE | 0.995 | 0.560 | 0.399 | 1.327 | 0.842 | |
| EML-NET | 1.220 | 0.510 | 0.536 | 1.232 | 0.807 | |
| Transalnet | 0.873 | 0.717 | 0.534 | 1.723 | 0.824 | |
| Temp-Sal | 0.712 | 0.719 | 0.629 | 1.768 | 0.813 | |
| SSwinTransformer | 0.652 | 0.687 | 0.606 | 1.701 | 0.868 | |
| Mano-P 1.0 | 0.615 | 0.769 | 0.695 | 1.735 | 0.868 | |
| Methods | Saliency | |||
|---|---|---|---|---|
| CC ↑ | SIM ↑ | NSS ↑ | AUC ↑ | |
| ACLNet | 0.477 | 0.329 | 2.36 | 0.915 |
| TASED-Net | 0.479 | 0.366 | 2.63 | 0.916 |
| STAViS | 0.569 | 0.425 | 2.94 | 0.931 |
| ViNet | 0.569 | 0.409 | 3.06 | 0.928 |
| CASP-Net | 0.620 | 0.478 | 3.34 | 0.940 |
| Mano-P 1.0 | 0.642 | 0.481 | 2.99 | 0.929 |
| Emotion Valence | Emotion Arousal | |||
|---|---|---|---|---|
| Acc ↑ | Acc ± 1 ↑ | Acc ↑ | Acc ± 1 ↑ | |
| Qwen2.5-VL-7B | 13.3 | 38.1 | 10.8 | 35.5 |
| PRISM I | 15.8 | 39.4 | 12.9 | 40.4 |
| PRISM II | 18.9 | 43.2 | 16.6 | 46.4 |
| Mano-P 1.0 | 20.2 | 46.5 | 18.7 | 47.3 |
📊 Expand Evaluation Data
Comparison of Task Execution Success Rate (SR) on Online-Mind2Web Benchmark Avg. Tokens/img represents the average visual token retention rate per image; lower values indicate more aggressive pruning.
GSPruning is a novel token pruning method designed for Vision-Language Models to efficiently process high-resolution web interfaces by preserving global spatial structure through anchor points and identifying semantic outliers for critical UI elements. It achieves 2-3× throughput speedup with minimal performance loss, enabling more efficient autonomous web agents.
| Model | Method | Avg. Tokens /img ↓ |
Training samples/s ↑ |
SR (↑) |
|---|---|---|---|---|
| Qwen3VL-2B | Baseline (w/o FT) | 100% | 5.08 | 0.290 |
| Baseline (FT) | 100% | 5.09 | 0.390 | |
| TextGuide | 12.55% | 13.54 | 0.310 | |
| FlashVLM [4] | 12.55% | 17.01 | 0.343 | |
| Compressor-VLA [11] | 13.33% | 16.92 | 0.293 | |
| HiPrune [16] | 25.09% | 16.67 | 0.333 | |
| PDrop [33] | 41.47% | 10.43 | 0.330 | |
| IVC | 25.09% | 7.89 | 0.303 | |
| Mano-P 1.0 | 25.09% | 20.04 | 0.370 | |
| Mano-P 1.0 | 12.57% | 22.62 | 0.336 | |
| Qwen3VL-4B | Baseline (FT) | 100% | 3.24 | 0.425 |
| PDrop | 41.47% | 5.58 | 0.365 | |
| IVC | 25.09% | 4.67 | 0.343 | |
| GSPruning | 25.09% | 16.72 | 0.400 |
Mano-Skill is a desktop GUI automation tool based on the Mano model, driving cross-platform graphical interface operations through natural language. We provide three different usage forms for the same core capability to adapt to different usage scenarios and user groups.
- Natural Language Driven: Users describe tasks in natural language, and the system automatically executes GUI operations
- Flexible Inference Modes:
- Local Mode: Models run locally, data stays on device, fast response
- Run directly on Mac mini/MacBook (M4 chip or above, 32GB+ RAM)
- Or use Mano-P computing stick (via USB 4.0 connection)
- Cloud Mode: Without local model configuration, uses cloud API service (
mano.mininglamp.com) - System automatically detects local model configuration and seamlessly switches inference modes
- Local Mode: Models run locally, data stays on device, fast response
- Comprehensive Interaction Support: Click, type, hotkey, scroll, drag, mouse movement, screenshot, wait, app launch, URL navigation
- Cross-Platform Support: macOS (stable), Windows, Linux (Beta)
Local Mode
- Capture current screen screenshot
- Run Mano-P model on local device (Mac mini/MacBook) or computing stick for inference
- Local model analyzes and returns next action instruction
- Client executes operation (click, type, etc.)
- Loop execution until task completion
Cloud Mode (Default)
- Capture current screen screenshot
- Send screenshot and task description to cloud vision model (
mano.mininglamp.com) - Cloud model analyzes and returns next action instruction
- Local client executes operation (click, type, etc.)
- Loop execution until task completion
Local Mode (Mac mini/MacBook or Computing Stick):
- ✅ Fully Local Processing: All data processing is completed locally, screenshots and task descriptions never leave the device
- ✅ Data Stays on Device: Does not access or transmit any data to external servers
- ✅ Maximum Privacy Protection: Suitable for handling sensitive information and high-security scenarios
Cloud Mode:
⚠️ Data Sent: Screenshots and task descriptions sent tomano.mininglamp.comfor real-time visual analysis- ✅ Data Not Sent: Does not access or transmit local files, clipboard contents, system credentials
⚠️ Privacy Note: Avoid displaying sensitive documents, chat logs, or credential information on screen when running tasks
General Assurance:
- ✅ Open Source Auditable: Complete source code publicly available for review
If you want to use Mano-P directly to accomplish GUI automation tasks, here are three different usage forms. Choose the one that best fits your use case.
Use Case: Developers, advanced users who need to quickly execute GUI automation tasks in terminal
Installation:
# Install via Homebrew
brew tap HanningWang/tap
brew install mano-cuaThe installation process will automatically:
- Create an isolated Python 3.13 virtual environment
- Install required dependencies (including Tkinter GUI library)
- Configure the executable command to system PATH
Usage:
# Run tasks
mano-cua run "Open WeChat and tell FTY the meeting is postponed"
mano-cua run "Search for AI news on Xiaohongshu and display the first post"
# Stop current task
mano-cua stopFeatures:
- ✅ Command-line interface, quick invocation
- ✅ Virtual environment isolation, no system Python pollution
- ✅ Suitable for script integration and batch processing
- ✅ Can be embedded in shell scripts
Project Resources:
- Homebrew Tap: github.com/Mininglamp-AI/homebrew-tap
Use Case: Python developers who need to integrate GUI automation capabilities into Python projects
Planned Features:
from mano_client import ManoClient
# Create client instance
client = ManoClient()
# Run task
client.run("Open WeChat and tell FTY the meeting is postponed")
# Stop task
client.stop()Planned Capabilities:
- Python API, easy integration into existing Python projects
- Supports asynchronous calls and callback functions
- Programmable control of task flow
- Suitable for building automated workflows
Development Status: Python SDK is under development. Please use CLI tool or Skill form for now.
Use Case: AI agents like Claude Code, OpenClaw that need to autonomously invoke GUI automation capabilities to complete user tasks
Installation:
Option 1: Install via Claude Code
In Claude Code, skills exist as "commands". Installation steps:
- Download the skill zip package from ClawHub
- After extraction, copy files to Claude Code's commands directory
- Restart Claude Code or in a new session, the skill will be automatically available
Option 2: Install via ClawHub CLI (Recommended)
Use the ClawHub CLI tool for one-click installation and skill management:
# Install skill
clawhub install mano-cua
# Install specific version
clawhub install mano-cua --version 1.0.0
# Update skill to latest version
clawhub update mano-cuaAfter installation, start a new Claude Code or OpenClaw session to use.
Prerequisites: ClawHub CLI tool must be installed first. See: OpenClaw Documentation - ClawHub
Usage:
When users make requests to AI agents that require GUI operations, the agent will automatically invoke this skill:
User: "Help me open WeChat, find FTY's chat window, and tell him the meeting is postponed to tomorrow"
Agent: [Automatically invokes mano-skill to complete GUI operation]
Features:
- ✅ Autonomously invoked by AI agents, no manual command execution needed
- ✅ Deeply integrated with agent reasoning capabilities
- ✅ Suitable for complex multi-step task automation
- ✅ ClawHub ecosystem with version management and security scanning
Project Resources:
- Source Code: github.com/Mininglamp-AI/mano-skill
- ClawHub Home: clawhub.ai/HanningWang/mano-cua
- Version: v1.0.0
- License: MIT
- Screen Recording Permission
- Accessibility Permission (keyboard/mouse control)
- Grant permissions in System Preferences → Privacy & Security
- Sensitive or potentially dangerous operations require user confirmation before execution
- Users can stop tasks at any time
- Only one task can run on each device simultaneously
- Only supports primary display (multi-display environment)
When a task is running, a small status panel appears in the top-right corner of the screen to:
- Display real-time task status and progress
- Provide task management functions (pause/stop)
- Remind users that an automation task is running to avoid accidental interference
Beta Version Notice: Mano-Skill is currently in Beta testing phase.
- macOS: ✅ Preferred and most thoroughly tested platform, stable and ready for use
- Windows and Linux:
⚠️ Platform adaptations not yet fully completed, minor issues may occur
We are continuously improving cross-platform compatibility. Feedback is welcome.
If you want to integrate Mano-P's model capabilities into your own applications, this section provides performance metrics and usage guidelines. Models will be released soon.
Our Mano-P model, after pruning with our proprietary GS-Pruning algorithm, achieves real-time performance on 4K-context tasks on the Apple M4 Pro chip. The relevant models and methods will be released soon. The table below presents the actual benchmark results on the M4 Pro.
| Model | Chip | Bandwidth | Framework | Context Length | Pruning Rate | Prefill Speed (tokens/s) |
Decode Speed (tokens/s) |
Peak Memory (GB) |
Prefill Time (s) |
Decode Time (s) |
|---|---|---|---|---|---|---|---|---|---|---|
| Mano-P 1.0-4B (w4a16) |
Apple M4 Pro 64GB RAM |
273 GB/s | Mano-SDK | 4112 | 0.5 | 476.952 | 76.75 | 4.356 | 8.621 | 0.265 |
| 8208 | 0.5 | 331.0 | 70.946 | 5.1471 | 24.792 | 0.253 |
Model will be released soon
If you are a researcher or wish to train customized GUI Agent models based on your own data, we plan to open-source the complete Mano-Action training methodology and related tools.
Release Soon
Mano-Action is a bidirectional self-reinforcement training framework specifically designed for GUI Grounding. Unlike traditional unidirectional prediction methods, Mano-Action achieves more robust interface understanding through Text↔Action cycle consistency learning, enabling the model to master both "locating elements from descriptions" and "describing given elements" simultaneously.
- Bidirectional Cycle Learning: Mutual reinforcement between Text → Action and Action → Text
- Three-Stage Progressive Training: Supervised Learning → Offline RL → Online RL
- Closed-Loop Data Generation: Automatically generate high-quality training data for continuous model improvement
- Edge Optimization Adaptation: Includes quantization, pruning, and other edge deployment optimization techniques
- 🎓 Academic Research: Explore new approaches to GUI understanding and multimodal interaction
- 🏢 Enterprise Customization: Train specialized models based on internal enterprise systems
- 🌐 Domain Adaptation: Fine-tune models for specific domains (healthcare, finance, etc.)
- 🔬 Algorithm Innovation: Develop new training techniques building on Mano-Action
| Feature | Mano-P | OpenClaw | Manus | Traditional RPA |
|---|---|---|---|---|
| Model Source | ✅ Built-in edge model | ❌ No model (rule-based engine) | ||
| Data Security | ✅ Local execution | ✅ Can be local | ||
| Control Method | ✅ Pure visual | ❌ HTML parsing+CLI | ❌ System API | |
| Use Scenarios | ✅ All-platform GUI | ✅ Cross-platform apps | ||
| Long Task Plan | ✅ Autonomous planning | ✅ Autonomous planning | ✅ Visual flow orchestration | ❌ Needs preset workflows |
| Response Speed | ✅ Instant response | ✅ Local/cloud execution | ✅ Instant response | |
| Deployment Cost | ✅ Low-cost entry | ✅ Open source & free | ✅ Low cost | |
| Robustness | ✅ UI change adaptive | ✅ LLM adaptive | ❌ UI change needs reconfig |
-
Edge Large Model + Flexible Deployment
- 4B model runs directly on Mac (M4 chip + 32GB RAM)
- Large parameter models (72B) supported via computing stick
- No API key configuration needed, ready out-of-the-box
- Significant advantage over OpenClaw (requires user model configuration) and Manus (cloud calls)
-
Universal Visual Understanding
- Pure visual GUI interaction, not limited to browsers and web apps
- Broader support than OpenClaw (CDP protocol mainly for browsers) and Manus (web apps only)
- Supports desktop software, 3D applications, professional tools, and non-standard GUIs
-
Offline Long-Task Autonomous Planning
- Fully offline reasoning for complex business processes
- Autonomous decision-making and error correction without internet connection
- Unique advantage over Manus (cloud latency) and traditional RPA (needs preset workflows)
-
Integrated Hardware Deployment
- Model + computing stick integrated solution, plug-and-play
- Lowers technical barrier compared to OpenClaw (open-source & free but requires self-deployment)
- Cross-platform compatible, rapid deployment and launch
Mano-P is based on the following research work:
1. Mano Series Model Foundation Paper
@article{mano-2025,
title={Mano Technical Report},
author={Tianyu Fu, Anyang Su, Chenxu Zhao, Hanning Wang, Minghui Wu, Zhe Yu, Fei Hu, Mingjia Shi, Wei Dong, Jiayao Wang, Yuyang Chen, Ruiyang Yu, Siran Peng, Menglin Li, Nan Huang, Haitian Wei, Jiawei Yu, Yi Xin, Xilin Zhao, Kai Gu, Ping Jiang, Sifan Zhou, Shuo Wang},
journal={arXiv preprint arXiv:2509.17336},
year={2025},
url={https://arxiv.org/abs/2509.17336}
}2. WebRetriever Benchmark
@article{webretriever-2026,
title={WebRetriever: A Large-Scale Comprehensive Benchmark for Efficient Web Agent Evaluation},
author={Wei Dong and Tianyu Fu and Zhe Yu and Hanning Wang and Anyang Su and Zhizhou Fang and Yuyang Chen and Shuo Wang and Minghui Wu and Ping Jiang and Zhen Lei and Chenxu Zhao},
year={2026},
note={To be published},
url={https://github.com/hhhhhhalf/WebRetriever}
}We welcome collaboration with academia:
- 🔬 Dataset Contribution: Provide new GUI task datasets
- 🤝 Joint Research: Collaborate on edge deployment, quantization optimization, GUI understanding, etc.
- 📚 Benchmarking: Test Mano-P on new evaluation sets
For academic collaboration inquiries, please contact: model@mininglamp.com
🤖 What is Mano-P?
Mano-P is an open-source GUI-VLA (Vision-Language-Action) agent designed to run locally on Apple Silicon edge devices. It uses pure visual understanding to automate desktop GUI operations across platforms.
⚖️ How does Mano-P compare to Claude Computer Use?
Performance Comparison:
- OSWorld (all models): Claude Sonnet 4.6 72.1% vs Mano-P 1.0-72B 58.2%
- WebRetriever Protocol I: Mano-P 41.7 NavEval vs Claude 4.5 Computer Use 31.3
Key Difference:
- ✅ Mano-P runs entirely on-device, no data leaves the machine
⚠️ Claude Computer Use requires cloud API calls
Use Case: Mano-P is particularly suitable for high-security scenarios.
🔌 Can Mano-P run without internet?
Yes! In local mode, all model inference runs on the Apple M4 device. ✅ No screenshots or task descriptions are sent to external servers.
💻 What hardware do I need?
Minimum Requirements:
- Mac mini or MacBook
- Apple M4 chip
- 32GB RAM
Alternative:
- Any Mac + Mano-P computing stick (connected via USB 4.0+)
📌 We plan to support more devices in the future.
📦 How do I install Mano-P?
CLI Tool:
brew tap HanningWang/tap && brew install mano-cuaOpenClaw/Claude Code Skill: See ClawHub - Mano-CUA
🔒 Is my data safe?
Local Mode: ✅ All processing happens on-device
Cloud Mode:
⚠️ Only screenshots and task descriptions sent tomano.mininglamp.com- ✅ No local files, clipboard contents, or credentials accessed
Transparency: Full client is open-source for audit
We welcome community contributions! If you want to contribute to the project:
- Fork this repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- 🐛 Bug fixes and issue reporting
- 📝 Documentation improvements and translations
- 💡 New feature suggestions and implementations
- 🧪 Test cases and benchmarking
- 🎨 Application scenarios and demo contributions
This project is licensed under the Apache License 2.0.
License Highlights:
- ✅ Commercial use
- ✅ Modification and distribution
- ✅ Patent grant
⚠️ Must retain copyright notice⚠️ Must state changes
- 📧 Email: model@mininglamp.com
- 🏠 Website: https://github.com/Mininglamp-AI/Mano-P
- 💬 Community: (To be added)
- 🐛 GitHub Issues: https://github.com/Mininglamp-AI/Mano-P/issues
Thanks to all developers and researchers who contributed to this project.
Special Thanks:
- Mano project team for providing the technical foundation
- DeepMiner platform for deep integration support
- Edge computing hardware partners
- Open source community contributors
Built with ❤️ by the Mano-P Team






