This repository contains an example implementation of the RoPE (Rotary Position Embedding) algorithm for the Ryzen Neural Processing Unit (NPU). The implementation utilizes the RoPE kernel to demonstrate the capabilities of the NPU in handling advanced embedding techniques.
The RoPE kernel offers two methods for applying embedding: a two-halves method (default) and an interleaved method. The two-halves method is what's used in Hugging Face's transformer library, while the interleaved method is used in Meta's official repo. Using the two-halves method is necessary when using Llama weights from Hugging Face, as the parameters of some layers are re-permuted while converting the Llama weights to Hugging Face. See Issue #25199 for reference.