mmpose-experiments

Code accompanying the manuscript:

Adoption of MMPose, a general purpose pose estimation library, for animal tracking Jessica D. Choi and Vivek Kumar The Jackson Laboratory Journal and DOI to be confirmed upon acceptance, 2026

This repository contains the configuration files, data conversion scripts, and evaluation code used to benchmark multiple pose estimation architectures on mouse behavior datasets using the MMPose framework.

Overview

We systematically compare several pose estimation architectures — both top-down and bottom-up — trained and evaluated on laboratory mouse datasets. The study examines accuracy (PCK, RMSE), inference speed, and cross-dataset generalization. Baseline comparisons include SLEAP, DeepLabCut, and SuperAnimals.

Models evaluated:

Model	Approach	Detector	Keypoint model
TD Def-DETR HRNet	Top-down	DeformableDETR	HRNet
TD Def-DETR DeepPose	Top-down	DeformableDETR	DeepPose
TD RetinaNet HRNet	Top-down	RetinaNet	HRNet
TD RetinaNet DeepPose	Top-down	RetinaNet	DeepPose
TD YOLO HRNet	Top-down	YOLOv3	HRNet
TD YOLO DeepPose	Top-down	YOLOv3	DeepPose
BU DEKR	Bottom-up	—	DEKR
DLC	Baseline	—	DeepLabCut
SLEAP	Baseline	—	SLEAP

Note: Plain DETR configs in this repository are archival experiment records. They were explored, including expected single-class settings where appropriate, but were not used in the manuscript because performance/convergence remained poor on these data.

Datasets:

Dataset	Description	# Keypoints
Kumar Lab Maze	Top-down video of mice in a maze (The Jackson Laboratory)	2 (nose, tail base)
Kumar Lab Maze Corners	Maze corner landmarks included for prototyping and spatial normalization in the inference pipeline; not included in manuscript	4 (corners tl/tr/br/bl)
OFA	Open Field Arena recordings (The Jackson Laboratory)	12
TopviewMouse5K	Large-scale top-view mouse dataset (Ye et al. 2024)	27 (SuperAnimals format)
TopviewMouse-OFA	Cross-dataset: TopviewMouse model retrained in MMPose, evaluated on OFA	—
TopviewMouse-Maze	Cross-dataset: TopviewMouse model retrained in MMPose, evaluated on Maze	—

Data availability:

Kumar Lab Maze: Available on Zenodo — [INSERT ZENODO DOI].

OFA: Available on Zenodo at https://zenodo.org/records/6380163.

TopviewMouse5K: Not our dataset — see Ye et al. 2024 for access.

Repository Structure

mmpose-experiments/
│
├── configs-maze-mouse/        # MMPose configs: mouse keypoints in maze
├── configs-maze-corners/      # MMPose configs: maze corner detection (used in inference pipeline; not reported in manuscript)
├── configs-ofa/               # MMPose configs: Open Field Activity dataset
├── configs-topviewmouse/      # MMPose configs: TopviewMouse dataset
├── configs-topview-maze/      # MMPose configs: topview model, maze split
├── configs-topview-ofa/       # MMPose configs: topview model, OFA split
│
├── OF-data/                   # Convert OFA HDF5 data → COCO format
├── demo/                      # Annotation conversion, dataset splits, inference demos
├── convert_maze/              # Merge maze annotations with SuperAnimals 27-kpt format
│
├── ground-truth/              # Core evaluation pipeline
│   ├── models_utils.py        # Shared utilities: metrics, filtering, plotting
│   ├── *_to_gt_format.py      # Convert model outputs → standard comparison format
│   └── compare-*-models.py    # Compute PCK/RMSE and generate figures
│
├── vm/                        # Singularity container definitions + SLURM training scripts
│
├── bottomup_demo.py           # Demo: bottom-up inference on a single image/video
├── topdown_demo_with_mmdet.py # Demo: top-down inference with MMDet detector
└── pck_schematic.py           # Generate PCK metric schematic figure

Environment

MMPose Container (training & inference)

Models were trained and evaluated inside a Singularity container. The definition file is at vm/mmpose.def.

Key versions:

PyTorch 2.3.1 + CUDA 12.1
MMCV 2.2.0
MMPose (cloned from open-mmlab/mmpose main branch, June 2024)
MMDet (patched for MMCV 2.2.0 compatibility; see note in vm/mmpose.def)

Build the container:

singularity build mmpose.sif vm/mmpose.def

Analysis Environment (evaluation & plotting)

The ground-truth/ evaluation scripts run outside the container and require:

pip install -r requirements-analysis.txt

See requirements-analysis.txt for the full list (pandas, numpy, plotnine, scipy, pycocotools).

Baseline Containers

vm/sleap.def — SLEAP pose estimation baseline
vm/deeplabcut.def — DeepLabCut pose estimation baseline

Workflow

The full pipeline runs in five stages:

1. Data Conversion

Convert raw annotations (SLEAP .slp files, DeepLabCut pickles, HDF5) to COCO-format JSON.

OF-data/ofa.py                      # OFA HDF5 → COCO
demo/sleap_to_coco.py               # SLEAP annotations → COCO (maze/corners)
demo/split_annotations.py           # Split COCO dataset into train/val by experiment
convert_maze/combine_maze_with_superanimals.py  # Merge with SuperAnimals 27-kpt format

2. Model Training

Train each architecture using the corresponding config file and the SLURM training scripts in vm/.

# Example: submit training job on Sumner2 cluster
sbatch vm/training-mmpose.sh

Config files follow the naming pattern configs-{dataset}/{architecture}-config.py.

3. Inference

Run inference on the test split using the trained model checkpoints.

bash vm/extract-bottom-up.sh        # Bottom-up (DEKR) predictions
bash demo/batch_infer_topview.sh    # Top-down batch inference

4. Format Conversion

Convert all model predictions and ground truth annotations to a standard per-model CSV format for evaluation.

cd ground-truth/
python gt_to_gt_format.py                   # SLEAP ground truth → CSV
python mmpose_to_gt_format.py               # MMPose predictions → CSV
python sleap_to_gt_format.py                # SLEAP predictions → CSV
python dlc_to_gt_format.py                  # DeepLabCut → CSV
python superanimals_posev6_to_maze.py       # SuperAnimals → maze CSV
python superanimals_posev6_to_ofa.py        # SuperAnimals → OFA CSV
python topview-on-maze.py                   # Topview model predictions on maze frames → CSV

5. Evaluation & Plotting

The evaluation pipeline runs in two stages. Stage 1 scripts read the per-model CSVs from Step 4, compute metrics, and write both plots and summary metric CSVs. Stage 2 scripts read those metric CSVs to produce the cross-dataset comparison figures.

Stage 1 — Per-dataset comparisons (produces plots + metric CSVs)

cd ground-truth/
python compare-models.py           # → Figure 1 B/C/D (maze model comparison, PCK curves, speed vs accuracy)
python compare-ofa-models.py       # → OFA per-condition plots; metric CSVs used by Stage 2
python compare-topview-models.py   # → TopviewMouse5K plots; metric CSVs
python compare-topview-maze.py     # → topview-on-maze plots; metric CSVs used by Stage 2

Stage 2 — Cross-dataset comparisons (reads Stage 1 metric CSVs; run after Stage 1)

python compare-datasets-maze.py    # → Figure 2 (Maze / Topview / Maze+Topview trained)
python compare-datasets-ofa.py     # → Figure 3 (OFA / Topview / OFA+Topview trained)

Figure 1A schematic (independent)

cd ..
python pck_schematic.py            # → Figure 1A (PCK evaluation visualization panel)

Metrics

PCK (Percentage of Correct Keypoints): Fraction of predicted keypoints within a threshold distance of the ground truth. Reported at both pixel-based thresholds (10–100px) and body-length-normalized thresholds (0.1–1.0×).
RMSE: Root mean square pixel error, averaged across keypoints and frames.

See ground-truth/models_utils.py for the full metric implementations.

Citation

If you use this code, please cite:

@article{choi2026adoption,
  title   = {Adoption of MMPose, a general purpose pose estimation library, for animal tracking},
  author  = {Choi, Jessica D. and Kumar, Vivek},
  journal = {TBD},
  year    = {2026},
  doi     = {TBD}
}

License

This repository is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported license (CC BY-NC-SA 3.0). See LICENSE for details.

Contact

Jessica D. Choi · jaycee.choi@jax.org Vivek Kumar · vivek.kumar@jax.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mmpose-experiments

Overview

Repository Structure

Environment

MMPose Container (training & inference)

Analysis Environment (evaluation & plotting)

Baseline Containers

Workflow

1. Data Conversion

2. Model Training

3. Inference

4. Format Conversion

5. Evaluation & Plotting

Stage 1 — Per-dataset comparisons (produces plots + metric CSVs)

Stage 2 — Cross-dataset comparisons (reads Stage 1 metric CSVs; run after Stage 1)

Figure 1A schematic (independent)

Metrics

Citation

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1,304 Commits
OF-data		OF-data
configs-maze-corners		configs-maze-corners
configs-maze-mouse		configs-maze-mouse
configs-ofa		configs-ofa
configs-topview-maze		configs-topview-maze
configs-topview-ofa		configs-topview-ofa
configs-topviewmouse		configs-topviewmouse
convert_maze		convert_maze
demo		demo
ground-truth		ground-truth
vm		vm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bottomup_demo.py		bottomup_demo.py
pck_schematic.py		pck_schematic.py
requirements-analysis.txt		requirements-analysis.txt
topdown_demo_with_mmdet.py		topdown_demo_with_mmdet.py

Folders and files

Latest commit

History

Repository files navigation

mmpose-experiments

Overview

Repository Structure

Environment

MMPose Container (training & inference)

Analysis Environment (evaluation & plotting)

Baseline Containers

Workflow

1. Data Conversion

2. Model Training

3. Inference

4. Format Conversion

5. Evaluation & Plotting

Stage 1 — Per-dataset comparisons (produces plots + metric CSVs)

Stage 2 — Cross-dataset comparisons (reads Stage 1 metric CSVs; run after Stage 1)

Figure 1A schematic (independent)

Metrics

Citation

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages