-
Notifications
You must be signed in to change notification settings - Fork 732
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[PyTorch] Propagate FP8 graph weight update flag in GroupedLinear
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3052
opened May 28, 2026 by
allenphilipj
Loading…
[PyTorch] Integrate the cuBLAS MXFP8 NN, NT support for sm120
#3050
opened May 28, 2026 by
KshitijLakhani
Collaborator
•
Draft
7 of 13 tasks
[PyTorch] Allocate grouped linear wgrads as tensor views
2.16.0
bug
Something isn't working
cpu_overhead
#3049
opened May 28, 2026 by
timmoon10
Member
Loading…
8 of 13 tasks
Enable NVFP4 fused grouped MLP
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
org-contribution
#3048
opened May 27, 2026 by
sraman-rgb
Contributor
Loading…
1 of 13 tasks
Feat/selective offload on srelu fuser
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3047
opened May 27, 2026 by
lhb8125
Contributor
Loading…
13 tasks
Add NVFP4 per-token quantization recipe
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
[PyTorch Debug] Add scale_inv_std stat and skip NVFP4 layers in LogFp8TensorStats
#3044
opened May 26, 2026 by
pggPL
Collaborator
Loading…
9 of 13 tasks
docs: expand comm gemm overlap guidance
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3043
opened May 26, 2026 by
omribz156
Loading…
5 of 13 tasks
Use cuDNN for row-scaled NVFP4 grouped GEMM
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
[PyTorch Debug] Fix scale_inv_min returning 0 for MXFP8/NVFP4
#3041
opened May 25, 2026 by
pggPL
Collaborator
Loading…
6 of 13 tasks
[PyTorch debug] FakeQuant: support Float8BlockScaling and fix MoE / w…
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3040
opened May 25, 2026 by
shangxiaokang
•
Draft
13 tasks
[PyTorch] Make
modules.GroupedLinear graph-safe
org-contribution
#3038
opened May 22, 2026 by
yaox12
Member
Loading…
1 of 13 tasks
[fix] Fix CUTLASS grouped GEMM segfault for empty groups
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3037
opened May 22, 2026 by
Baibaifan
Loading…
[JAX] Expert Parallelism: JAX primitives + VJPs
#3036
opened May 22, 2026 by
phu0ngng
Collaborator
Loading…
8 of 13 tasks
Expert Parallelism: common C API + NCCL EP backend
#3034
opened May 22, 2026 by
phu0ngng
Collaborator
Loading…
8 of 13 tasks
Add MXFP8 attention unit test with linear and rope layers
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3033
opened May 22, 2026 by
layalir
Loading…
[Common] Enable NVFP4 2D block scaling in columnwise only
#3027
opened May 21, 2026 by
negvet
Collaborator
Loading…
1 of 13 tasks
[PyT] Reduce test sizes in fused attn fp8 vs fp16 to avoid OOM
attention
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3020
opened May 21, 2026 by
vedaanta
Loading…
1 of 13 tasks
fix(grouped_linear): handle all-zero-token forward and backward
org-contribution
#3019
opened May 21, 2026 by
jubick1337
Loading…
13 tasks
Add the getter and setter of skip_fp8_weight_update_tensor
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3015
opened May 20, 2026 by
xrennvidia
Collaborator
Loading…
6 of 13 tasks
adding NVIDIA_TF32_OVERRIDE=0 to test_numerics.py
#3014
opened May 20, 2026 by
francesco-bertolotti
Contributor
Loading…
[Common] Optimize fused router forward/backward kernels
org-contribution
#3012
opened May 19, 2026 by
harryzhou2000
Member
Loading…
[PyTorch] NVFP4 RHT cast-fusion: emit GEMM-swizzled scale factors directly
#3011
opened May 19, 2026 by
cael-ling
Contributor
Loading…
8 of 13 tasks
Previous Next
ProTip!
What’s not been updated in a month: updated:<2026-04-28.