A surgical video world foundation model suite based on NVIDIA Cosmos and SurgWorld, part of the NVIDIA MedTech Open Models.
Cosmos-H-Surgical delivers high-quality video prediction and transfer for surgical scenes, including future-state simulation and control-conditioned generation across modalities. It comprises two sub-projects — Predict for image-to-video generation and Transfer for multi-modal control-based generation — both adapted from NVIDIA Cosmos 2.5 for surgical video data. For action-conditioned surgical simulation with robot kinematics, see the companion repo Cosmos-H-Surgical-Simulator.
This project was conducted by NVIDIA in collaboration with Chinese University of Hong Kong, National University of Singapore, and Shanghai Jiao Tong University.
- [March 2026] — Released SurgΣ: a large-scale multimodal surgical dataset and foundation model suite for surgical intelligence.
- [March 2026] — Released BSA: generalized recognition of basic surgical actions enabling skill assessment and VLM-based surgical planning.
- [March 2026] — Released Cosmos-H-Surgical-Predict and Cosmos-H-Surgical-Transfer as part of the NVIDIA MedTech Open Models.
| Model | Base Model | Params | Capability | Input | HuggingFace | License |
|---|---|---|---|---|---|---|
| Cosmos-H-Surgical-Predict | Cosmos-Predict2.5-2B | 2B | Future-state video prediction (Image2World) | text + image | Weights | NVIDIA-OneWay-Noncommercial-License |
| Cosmos-H-Surgical-Transfer | Cosmos-Transfer2.5-2B | 2B | Control-conditioned generation (depth, edge, seg, blur) | text + video + control maps | Weights | NVIDIA-OneWay-Noncommercial-License |
Cosmos-H-Surgical/
├── predict/ # Cosmos-H-Surgical-Predict
│ ├── cosmos_predict2/ # Predict package (experiments, datasets)
│ ├── packages/ # Workspace deps (cosmos-oss, cosmos-cuda, cosmos-gradio)
│ ├── docs/ # Setup, inference, post-training
│ ├── examples/ # Inference scripts
│ ├── assets/ # Example inputs (JSON configs, images)
│ ├── Dockerfile
│ └── pyproject.toml
├── transfer/ # Cosmos-H-Surgical-Transfer
│ ├── cosmos_transfer2/ # Transfer package
│ ├── packages/ # Workspace deps
│ ├── docs/ # Setup, inference, post-training
│ ├── examples/ # Inference scripts
│ ├── assets/ # Example inputs (JSON specs, control maps)
│ ├── Dockerfile
│ └── pyproject.toml
├── LICENSE # Apache 2.0 (source code)
└── LICENSE.weights # NVIDIA-OneWay-Noncommercial-License License (weights)
System Requirements: NVIDIA Ampere+ GPU (A100, H100, B200), Linux x86-64, CUDA 12.8+, Python 3.10+
git clone git@github.com:NVIDIA-Medtech/Cosmos-H-Surgical.git
cd Cosmos-H-Surgical
git lfs pullEach sub-project has its own environment. For example, to set up Predict:
cd predict
curl -LsSf https://astral.sh/uv/install.sh | sh
uv python install && uv sync --extra=cu128
source .venv/bin/activateSee predict/docs/setup.md or transfer/docs/setup.md for full instructions including Docker.
Predict (Image2World):
cd predict
python examples/inference.py -i assets/base/coagulation.json -o outputs/base_video2world --inference-type=video2worldTransfer (control-conditioned):
cd transfer
python examples/inference.py -i assets/coagulation_example/depth/coagulation_depth_spec.json -o outputs/depth| Topic | Predict | Transfer |
|---|---|---|
| Setup | predict/docs/setup.md | transfer/docs/setup.md |
| Inference | predict/docs/inference.md | transfer/docs/inference.md |
| Post-training | predict/docs/post-training.md | transfer/docs/post-training.md |
| Troubleshooting | predict/docs/troubleshooting.md | transfer/docs/troubleshooting.md |
| GPU Hardware | Generation Time | End-to-End Time |
|---|---|---|
| NVIDIA B200 | 92.25 sec | 186.92 sec |
| NVIDIA H100 NVL | 445.52 sec | 895.33 sec |
| NVIDIA H100 PCIe | 264.13 sec | 533.58 sec |
| NVIDIA H20 | 683.65 sec | 1370.39 sec |
End-to-end time measured for 121-frame input video (two 93-frame chunk generations). Guardrails disabled.
| Component | License |
|---|---|
| Source code | Apache 2.0 |
| Cosmos-H-Surgical-Predict weights | NVIDIA-OneWay-Noncommercial-License |
| Cosmos-H-Surgical-Transfer weights | NVIDIA-OneWay-Noncommercial-License |
This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.
@misc{he2026cosmoshsurgicallearningsurgicalrobot,
title={Cosmos-H-Surgical: Learning Surgical Robot Policies from Videos via World Modeling},
author={Yufan He and Pengfei Guo and Mengya Xu and Zhaoshuo Li and Andriy Myronenko and Dillan Imans and Bingjie Liu and Dongren Yang and Mingxue Gu and Yongnan Ji and Yueming Jin and Ren Zhao and Baiyong Shen and Daguang Xu},
year={2026},
eprint={2512.23162},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2512.23162},
}
@misc{zeng2026surgsigma,
title={Surg$\Sigma$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence},
author={Zhitao Zeng and Mengya Xu and Jian Jiang and Pengfei Guo and Yunqiu Xu and Zhu Zhuo and Chang Han Low and Yufan He and Dong Yang and Chenxi Lin and Yiming Gu and Jiaxin Guo and Yutong Ban and Daguang Xu and Qi Dou and Yueming Jin},
year={2026},
eprint={2603.16822},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2603.16822},
}
@misc{xu2026generalizedrecognitionbasicsurgicalactions,
title={Generalized Recognition of Basic Surgical Actions Enables Skill Assessment and Vision-Language-Model-based Surgical Planning},
author={Mengya Xu and Daiyun Shen and Jie Zhang and Hon Chi Yip and Yujia Gao and Cheng Chen and Dillan Imans and Yonghao Long and Yiru Ye and Yixiao Liu and Rongyun Mai and Kai Chen and Hongliang Ren and Yutong Ban and Guangsuo Wang and Francis Wong and Chi-Fai Ng and Kee Yuan Ngiam and Russell H. Taylor and Daguang Xu and Yueming Jin and Qi Dou},
year={2026},
eprint={2603.12787},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.12787},
}- Cosmos-H-Surgical Paper (arXiv)
- Basic Surgical Actions Dataset Paper (arXiv)
- HuggingFace Collection
- NVIDIA Cosmos Platform
- Cosmos-H-Surgical-Simulator — Sister repo (action-conditioned surgical simulation)
- NVIDIA MedTech Open Models
