Skip to content

Pull requests: ggml-org/llama.cpp

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
#21159 opened Mar 29, 2026 by gaugarg-nv Loading…
ggml-cpu: fix fallback for RVV kernels without zvfh ggml changes relating to the ggml tensor library for machine learning
#21157 opened Mar 29, 2026 by taimur-10x Loading…
ci : bump ty to 0.0.26 devops improvements to build systems and github actions examples python python script changes script Script related server
#21156 opened Mar 29, 2026 by CISC Loading…
examples : add llama-eval examples python python script changes
#21152 opened Mar 29, 2026 by ggerganov Draft
5 tasks
Support for DeepseekV32ForCausalLM with DeepSeek Sparse Attention (DSA) ggml changes relating to the ggml tensor library for machine learning model Model specific Nvidia GPU Issues specific to Nvidia GPUs python python script changes testing Everything test related
#21149 opened Mar 29, 2026 by fairydreaming Draft
chore(docs): update list of UIs
#21148 opened Mar 29, 2026 by sbhjt-gr Loading…
ggml-webgpu: Add the support of MUL_MAT_ID documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning WebGPU
#21147 opened Mar 29, 2026 by yomaytk Loading…
CI: Fix docker multiarch overwrite devops improvements to build systems and github actions
#21144 opened Mar 29, 2026 by Ts-sound Loading…
Multi-backend profiler Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs examples ggml changes relating to the ggml tensor library for machine learning Hexagon IBM zDNN issues specific to IBM zDNN Accelerator Nvidia GPU Issues specific to Nvidia GPUs OpenCL Issues specific to the OpenCL backend OpenVINO python python script changes SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Vulkan Issues specific to the Vulkan backend WebGPU
#21138 opened Mar 29, 2026 by pwilkin Draft
CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD devops improvements to build systems and github actions documentation Improvements or additions to documentation
#21122 opened Mar 28, 2026 by ehfd Loading…
metal: add opt-in V skip for negligible attention weights Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning
#21119 opened Mar 28, 2026 by TheTom Loading…
convert: Add compressed-tensors NVFP4 conversion python python script changes
#21095 opened Mar 28, 2026 by michaelw9999 Loading…
ggml : add CPU TurboQuant KV cache types (TBQ3_0 / TBQ4_0) examples ggml changes relating to the ggml tensor library for machine learning server testing Everything test related
#21089 opened Mar 27, 2026 by elusznik Loading…
[CUDA] Reduce the number of stream-k blocks to reduce the overhead of the flash_attn_stream_k_fixup kernel ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
#21086 opened Mar 27, 2026 by gaugarg-nv Loading…
fix cmake problem to exclude CCAN Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning need more info The OP should provide more details about the issue
#21075 opened Mar 27, 2026 by sunqingn7 Loading…
ggml-cuda: Add generic NVFP4 MMQ kernel ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs python python script changes
#21074 opened Mar 27, 2026 by michaelw9999 Loading…
hexagon: optimize HMX matmul operations ggml changes relating to the ggml tensor library for machine learning Hexagon
#21071 opened Mar 27, 2026 by chraac Loading…
ggml: allow prefetching tensor overrides Ascend NPU issues specific to Ascend NPUs examples ggml changes relating to the ggml tensor library for machine learning IBM zDNN issues specific to IBM zDNN Accelerator Nvidia GPU Issues specific to Nvidia GPUs OpenCL Issues specific to the OpenCL backend OpenVINO SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language WebGPU
#21067 opened Mar 27, 2026 by am17an Draft
ProTip! Updated in the last three days: updated:>2026-03-26.