Code accompanying the manuscript:
Adoption of MMPose, a general purpose pose estimation library, for animal tracking Jessica D. Choi and Vivek Kumar The Jackson Laboratory Journal and DOI to be confirmed upon acceptance, 2026
This repository contains the configuration files, data conversion scripts, and evaluation code used to benchmark multiple pose estimation architectures on mouse behavior datasets using the MMPose framework.
We systematically compare several pose estimation architectures — both top-down and bottom-up — trained and evaluated on laboratory mouse datasets. The study examines accuracy (PCK, RMSE), inference speed, and cross-dataset generalization. Baseline comparisons include SLEAP, DeepLabCut, and SuperAnimals.
Models evaluated:
| Model | Approach | Detector | Keypoint model |
|---|---|---|---|
| TD Def-DETR HRNet | Top-down | DeformableDETR | HRNet |
| TD Def-DETR DeepPose | Top-down | DeformableDETR | DeepPose |
| TD RetinaNet HRNet | Top-down | RetinaNet | HRNet |
| TD RetinaNet DeepPose | Top-down | RetinaNet | DeepPose |
| TD YOLO HRNet | Top-down | YOLOv3 | HRNet |
| TD YOLO DeepPose | Top-down | YOLOv3 | DeepPose |
| BU DEKR | Bottom-up | — | DEKR |
| DLC | Baseline | — | DeepLabCut |
| SLEAP | Baseline | — | SLEAP |
Note: Plain DETR configs in this repository are archival experiment records. They were explored, including expected single-class settings where appropriate, but were not used in the manuscript because performance/convergence remained poor on these data.
Datasets:
| Dataset | Description | # Keypoints |
|---|---|---|
| Kumar Lab Maze | Top-down video of mice in a maze (The Jackson Laboratory) | 2 (nose, tail base) |
| Kumar Lab Maze Corners | Maze corner landmarks included for prototyping and spatial normalization in the inference pipeline; not included in manuscript | 4 (corners tl/tr/br/bl) |
| OFA | Open Field Arena recordings (The Jackson Laboratory) | 12 |
| TopviewMouse5K | Large-scale top-view mouse dataset (Ye et al. 2024) | 27 (SuperAnimals format) |
| TopviewMouse-OFA | Cross-dataset: TopviewMouse model retrained in MMPose, evaluated on OFA | — |
| TopviewMouse-Maze | Cross-dataset: TopviewMouse model retrained in MMPose, evaluated on Maze | — |
Data availability:
- Kumar Lab Maze: Available on Zenodo — [INSERT ZENODO DOI].
- OFA: Available on Zenodo at https://zenodo.org/records/6380163.
- TopviewMouse5K: Not our dataset — see Ye et al. 2024 for access.
mmpose-experiments/
│
├── configs-maze-mouse/ # MMPose configs: mouse keypoints in maze
├── configs-maze-corners/ # MMPose configs: maze corner detection (used in inference pipeline; not reported in manuscript)
├── configs-ofa/ # MMPose configs: Open Field Activity dataset
├── configs-topviewmouse/ # MMPose configs: TopviewMouse dataset
├── configs-topview-maze/ # MMPose configs: topview model, maze split
├── configs-topview-ofa/ # MMPose configs: topview model, OFA split
│
├── OF-data/ # Convert OFA HDF5 data → COCO format
├── demo/ # Annotation conversion, dataset splits, inference demos
├── convert_maze/ # Merge maze annotations with SuperAnimals 27-kpt format
│
├── ground-truth/ # Core evaluation pipeline
│ ├── models_utils.py # Shared utilities: metrics, filtering, plotting
│ ├── *_to_gt_format.py # Convert model outputs → standard comparison format
│ └── compare-*-models.py # Compute PCK/RMSE and generate figures
│
├── vm/ # Singularity container definitions + SLURM training scripts
│
├── bottomup_demo.py # Demo: bottom-up inference on a single image/video
├── topdown_demo_with_mmdet.py # Demo: top-down inference with MMDet detector
└── pck_schematic.py # Generate PCK metric schematic figure
Models were trained and evaluated inside a Singularity container. The definition file is at vm/mmpose.def.
Key versions:
- PyTorch 2.3.1 + CUDA 12.1
- MMCV 2.2.0
- MMPose (cloned from
open-mmlab/mmposemain branch, June 2024) - MMDet (patched for MMCV 2.2.0 compatibility; see note in
vm/mmpose.def)
Build the container:
singularity build mmpose.sif vm/mmpose.defThe ground-truth/ evaluation scripts run outside the container and require:
pip install -r requirements-analysis.txtSee requirements-analysis.txt for the full list (pandas, numpy, plotnine, scipy, pycocotools).
vm/sleap.def— SLEAP pose estimation baselinevm/deeplabcut.def— DeepLabCut pose estimation baseline
The full pipeline runs in five stages:
Convert raw annotations (SLEAP .slp files, DeepLabCut pickles, HDF5) to COCO-format JSON.
OF-data/ofa.py # OFA HDF5 → COCO
demo/sleap_to_coco.py # SLEAP annotations → COCO (maze/corners)
demo/split_annotations.py # Split COCO dataset into train/val by experiment
convert_maze/combine_maze_with_superanimals.py # Merge with SuperAnimals 27-kpt format
Train each architecture using the corresponding config file and the SLURM training scripts in vm/.
# Example: submit training job on Sumner2 cluster
sbatch vm/training-mmpose.shConfig files follow the naming pattern configs-{dataset}/{architecture}-config.py.
Run inference on the test split using the trained model checkpoints.
bash vm/extract-bottom-up.sh # Bottom-up (DEKR) predictions
bash demo/batch_infer_topview.sh # Top-down batch inferenceConvert all model predictions and ground truth annotations to a standard per-model CSV format for evaluation.
cd ground-truth/
python gt_to_gt_format.py # SLEAP ground truth → CSV
python mmpose_to_gt_format.py # MMPose predictions → CSV
python sleap_to_gt_format.py # SLEAP predictions → CSV
python dlc_to_gt_format.py # DeepLabCut → CSV
python superanimals_posev6_to_maze.py # SuperAnimals → maze CSV
python superanimals_posev6_to_ofa.py # SuperAnimals → OFA CSV
python topview-on-maze.py # Topview model predictions on maze frames → CSV
The evaluation pipeline runs in two stages. Stage 1 scripts read the per-model CSVs from Step 4, compute metrics, and write both plots and summary metric CSVs. Stage 2 scripts read those metric CSVs to produce the cross-dataset comparison figures.
cd ground-truth/
python compare-models.py # → Figure 1 B/C/D (maze model comparison, PCK curves, speed vs accuracy)
python compare-ofa-models.py # → OFA per-condition plots; metric CSVs used by Stage 2
python compare-topview-models.py # → TopviewMouse5K plots; metric CSVs
python compare-topview-maze.py # → topview-on-maze plots; metric CSVs used by Stage 2python compare-datasets-maze.py # → Figure 2 (Maze / Topview / Maze+Topview trained)
python compare-datasets-ofa.py # → Figure 3 (OFA / Topview / OFA+Topview trained)cd ..
python pck_schematic.py # → Figure 1A (PCK evaluation visualization panel)- PCK (Percentage of Correct Keypoints): Fraction of predicted keypoints within a threshold distance of the ground truth. Reported at both pixel-based thresholds (10–100px) and body-length-normalized thresholds (0.1–1.0×).
- RMSE: Root mean square pixel error, averaged across keypoints and frames.
See ground-truth/models_utils.py for the full metric implementations.
If you use this code, please cite:
@article{choi2026adoption,
title = {Adoption of MMPose, a general purpose pose estimation library, for animal tracking},
author = {Choi, Jessica D. and Kumar, Vivek},
journal = {TBD},
year = {2026},
doi = {TBD}
}This repository is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported license (CC BY-NC-SA 3.0). See LICENSE for details.
Jessica D. Choi · jaycee.choi@jax.org Vivek Kumar · vivek.kumar@jax.org