Parcae

Parcae: Scaling Laws For Stable Looped Language Models
Hayden Prairie, Zachary Novack, Taylor Berg-Kirkpatrick, Daniel Y. Fu
Paper: https://arxiv.org/abs/2604.12946

About

Parcae is a new looped architecture that utilizes a handful of techniques to stabilize training. Parcae enables stable, hassle-free training of looped models, which we use to derive the first scaling laws for looping, finding that compute-optimal training scales looping and data in tandem.

Installation

Just wanna use off-the-shelf models? We make things easy with a PyPI package to access models. Install the package with the following:

pip install parcae-lm

If you are training models, then please clone the GitHub repository

git clone https://github.com/SandyResearch/parcae.git
cd parcae

and then follow the following:

Docker (recommended)

Our launch scripts handle everything automatically. Set PROJECT_DIR and DOCKER_IMAGE at the top of launch_job.slurm or launch_interactive.sh, then:

# Interactive development shell
bash launch_interactive.sh

# Submit a training job
CONFIG=launch_configs/parcae-small-140m.yaml sbatch launch_job.slurm

The Docker image is hosted publicly at ghcr.io/sandyresearch/parcae and will be pulled automatically.

Local

Requires Python 3.11+ and PyTorch 2.4+. Install PyTorch first, following pytorch.org, then:

pip install -e .

Usage

Models

We provide three ways to instantiate models: load pretrained weights with from_pretrained, build from a built-in config with create_model, or customize a config before building with create_config.

import parcae_lm

# Load a pretrained model from HuggingFace
model = parcae_lm.from_pretrained("SandyResearch/parcae-140m")

# Create a model from a built-in config
model = parcae_lm.create_model("parcae-small-140m")

# Or get the config, customize it, then build
config = parcae_lm.create_config("parcae-small-140m")
config.mean_recurrence = 16
model = config.construct_model()

Training

Downloading Data

python scripts/download_data.py fineweb-100bt   # FineWeb-Edu 100B tokens
python scripts/download_data.py fineweb-350bt   # FineWeb-Edu 350B tokens
python scripts/download_data.py huginn          # Huginn dataset

Training a Tokenizer

Train a GPT-4 style BPE tokenizer on your data:

python scripts/tok_train.py --data-dir fineweb --output-dir tokenizer/ --vocab-size 32768

Evaluate compression ratios against GPT-2 and GPT-4 tokenizers:

python scripts/tok_eval.py --tokenizer tokenizer/parcae_tokenizer --data-dir fineweb

Launching Training

Training is configured via YAML files in launch_configs/. Available configs:

Config	Architecture	Parameters
`parcae-small-140m.yaml`	Parcae	140M
`parcae-medium-370m.yaml`	Parcae	370M
`parcae-large-770m.yaml`	Parcae	770M
`parcae-xlarge-1_3b.yaml`	Parcae	1.3B
`gpt-small-140m.yaml`	GPT	140M
`gpt-medium-370m.yaml`	GPT	370M
`gpt-large-770m.yaml`	GPT	770M
`gpt-xlarge-1_3b.yaml`	GPT	1.3B

Single node:

bash runs/run_training.sh launch_configs/parcae-small-140m.yaml parcae-small 8

Multi-node (Slurm):

CONFIG=launch_configs/parcae-large-770m.yaml sbatch launch_job.slurm

Eval

Evaluate models using scripts/eval.py. Supports loading from HuggingFace or local checkpoints.

# Evaluate a pretrained model from HuggingFace
python scripts/eval.py --hf_repo SandyResearch/parcae-140m --eval_tasks core

# Evaluate a local checkpoint
bash runs/run_eval.sh outputs/parcae-small-140m eval_configs/eval-core.yaml 8

# Evaluate validation loss
python scripts/eval.py --hf_repo SandyResearch/parcae-140m --eval_tasks bpb \
    --tasks.bpb.val_data_dir /path/to/val/data

Available eval configs in eval_configs/:

eval-core.yaml — Core benchmark suite
eval-core-extended.yaml — Extended core benchmarks
eval-val-loss.yaml — Validation loss / bits-per-byte
eval-lambada.yaml — LAMBADA evaluation

Pretrained Models

Pretrained models are uploaded to Hugging Face: parcae-140m, parcae-370m, parcae-770m, parcae-1.3b, trained on the FineWeb-Edu dataset. Models will be auto-downloaded when using from_pretrained.

These models dimensions are:

Model	Parameters	Prelude	Core	Coda	Model dim.	Recurrence
Parcae-140M	140M	2	2	2	768	8
Parcae-370M	370M	4	4	4	1024	8
Parcae-770M	770M	6	6	6	1280	8
Parcae-1.3B	1.3B	8	8	8	1536	8

Note: these are base models without any form of downstream modification (instruction tuning, etc.).

Replicating Scaling Laws

The sweep scripts in runs/ reproduce the scaling law experiments from the paper. See runs/sweep_recurrence.sh for recurrence scaling and runs/sweep_flops.sh for compute-optimal scaling.

Citations

@misc{prairie2026parcaescalinglawsstable,
      title={Parcae: Scaling Laws For Stable Looped Language Models}, 
      author={Hayden Prairie and Zachary Novack and Taylor Berg-Kirkpatrick and Daniel Y. Fu},
      year={2026},
      eprint={2604.12946},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2604.12946}, 
}

References

This code-base was built on karpathy/nanochat, seal-rg/recurrent-pretraining, and Lightning-AI/litgpt. While most code has been thoroughly adapted, we greatly appreciate the work that went into developing each of these training libraries.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
assets		assets
eval_configs		eval_configs
launch_configs		launch_configs
parcae_lm		parcae_lm
receval		receval
recpre		recpre
runs		runs
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
launch_interactive.sh		launch_interactive.sh
launch_job.slurm		launch_job.slurm
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parcae

About

Installation

Docker (recommended)

Local

Usage

Models

Training

Downloading Data

Training a Tokenizer

Launching Training

Eval

Pretrained Models

Replicating Scaling Laws

Citations

References

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Parcae

About

Installation

Docker (recommended)

Local

Usage

Models

Training

Downloading Data

Training a Tokenizer

Launching Training

Eval

Pretrained Models

Replicating Scaling Laws

Citations

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages