Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[PyTorch] Propagate FP8 graph weight update flag in GroupedLinear community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3052 opened May 28, 2026 by allenphilipj Loading…
[PyTorch] Integrate the cuBLAS MXFP8 NN, NT support for sm120
#3050 opened May 28, 2026 by KshitijLakhani Collaborator Draft
7 of 13 tasks
[PyTorch] Allocate grouped linear wgrads as tensor views 2.16.0 bug Something isn't working cpu_overhead
#3049 opened May 28, 2026 by timmoon10 Member Loading…
8 of 13 tasks
Enable NVFP4 fused grouped MLP community-contribution PRs from external contributor outside the core maintainers, representing community-driven work. org-contribution
#3048 opened May 27, 2026 by sraman-rgb Contributor Loading…
1 of 13 tasks
Feat/selective offload on srelu fuser community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3047 opened May 27, 2026 by lhb8125 Contributor Loading…
13 tasks
Add NVFP4 per-token quantization recipe community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3045 opened May 26, 2026 by cael-ling Contributor Draft
13 tasks
[PyTorch Debug] Add scale_inv_std stat and skip NVFP4 layers in LogFp8TensorStats
#3044 opened May 26, 2026 by pggPL Collaborator Loading…
9 of 13 tasks
docs: expand comm gemm overlap guidance community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3043 opened May 26, 2026 by omribz156 Loading…
5 of 13 tasks
Use cuDNN for row-scaled NVFP4 grouped GEMM community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3042 opened May 26, 2026 by zianglih Contributor Draft
[PyTorch Debug] Fix scale_inv_min returning 0 for MXFP8/NVFP4
#3041 opened May 25, 2026 by pggPL Collaborator Loading…
6 of 13 tasks
[PyTorch debug] FakeQuant: support Float8BlockScaling and fix MoE / w… community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3040 opened May 25, 2026 by shangxiaokang Draft
13 tasks
TE_DType in python
#3039 opened May 22, 2026 by vthumbe1503 Collaborator Draft
13 tasks
[PyTorch] Make modules.GroupedLinear graph-safe org-contribution
#3038 opened May 22, 2026 by yaox12 Member Loading…
1 of 13 tasks
[fix] Fix CUTLASS grouped GEMM segfault for empty groups community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3037 opened May 22, 2026 by Baibaifan Loading…
[JAX] Expert Parallelism: JAX primitives + VJPs
#3036 opened May 22, 2026 by phu0ngng Collaborator Loading…
8 of 13 tasks
Expert Parallelism: common C API + NCCL EP backend
#3034 opened May 22, 2026 by phu0ngng Collaborator Loading…
8 of 13 tasks
Add MXFP8 attention unit test with linear and rope layers community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3033 opened May 22, 2026 by layalir Loading…
[Common] Enable NVFP4 2D block scaling in columnwise only
#3027 opened May 21, 2026 by negvet Collaborator Loading…
1 of 13 tasks
[PyT] Reduce test sizes in fused attn fp8 vs fp16 to avoid OOM attention community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3020 opened May 21, 2026 by vedaanta Loading…
1 of 13 tasks
Add the getter and setter of skip_fp8_weight_update_tensor community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3015 opened May 20, 2026 by xrennvidia Collaborator Loading…
6 of 13 tasks
adding NVIDIA_TF32_OVERRIDE=0 to test_numerics.py
#3014 opened May 20, 2026 by francesco-bertolotti Contributor Loading…
[PyTorch] NVFP4 RHT cast-fusion: emit GEMM-swizzled scale factors directly
#3011 opened May 19, 2026 by cael-ling Contributor Loading…
8 of 13 tasks
ProTip! What’s not been updated in a month: updated:<2026-04-28.