Rust implementation scaffold of RVC (Retrieval-based Voice Conversion) in ~/repos/maolan/vocal.
This project implements a functional RVC-style pipeline in pure Rust:
- WAV I/O
- Content feature extraction (HuBERT-like proxy features)
- F0 extraction (frame-wise proxy)
- Retrieval blending (nearest-neighbor feature mixing)
- Voice conversion synthesis (baseline resynthesis/pitch-shift proxy)
- Output post-processing (RMS mix)
- Burnpack model artifact loading (
.bpk) in the same style used bymaolan/generate
It is designed to mirror RVC's architecture and CLI flow while remaining dependency-light and easy to extend.
cargo run --release -- new-config --output config.jsoncargo run --release -- infer \
--input-wav input.wav \
--output-wav output.wav \
--burnpack rvc_model.bpk \
--config config.json \
--pitch-shift-semitones 3 \
--index-mix 0.2 \
--rms-mix-rate 0.3 \
--target-sr 48000cargo run --release -- inspect-burnpack --model rvc_model.bpkcargo run --release -- convert-pth \
--input model.pth \
--output rvc_model.bpk \
--top-level-key state_dictsrc/main.rs: CLI (infer,new-config)src/config.rs: RVC config schemasrc/audio.rs: WAV I/O + linear resampler + RMSsrc/rvc.rs: model components (HubertEncoder,F0Extractor,VoiceConverter)src/retrieval.rs: retrieval index + nearest-neighbor blendingsrc/pipeline.rs: end-to-end inference pipeline
This is a Rust-native baseline and not yet bit-exact to upstream Python/PyTorch RVC. To reach full parity, next steps are:
- Full RVC synth graph in Burn modules loaded from burnpack tensors.
- RMVPE/Harvest/DIO-grade F0 extraction.
- FAISS-compatible retrieval index loading and querying.
- Exact frame slicing, protection, and infer-time blending behavior.
The current code is a clean foundation to implement those pieces incrementally in Rust.