Adds NVIDIA PixelDiT and PiD support#1393
Conversation
| # PixelDiT | ||
|
|
||
| - NVIDIA's [PixelDiT](<https://huggingface.co/Comfy-Org/PixelDiT>) is supported in SwarmUI! | ||
| - Or the smaller FP8 version: [Comfy-Org/PixelDiT - mxfp8](<https://huggingface.co/Comfy-Org/PixelDiT/resolve/main/diffusion_models/pixeldit_1300m_1024px_mxfp8.safetensors>) |
| if (doUpscale && upscaleMethod.StartsWith("pidmodel-")) | ||
| { | ||
| string pidModelName = upscaleMethod.After("pidmodel-"); | ||
| T2IModel pidModel = Program.MainSDModels.GetModel(pidModelName); |
There was a problem hiding this comment.
check t2iprompthandling for "lora", there's a weird special case pattern for how indirectly specified models are read that accommodates both white/blacklisting of models and user-typing issues (eg excluding the .safetensors or not)
| string pidSampled = g.CreateKSampler(g.CurrentModel.Path, [pidCond, 0], pidNeg, [pidEmptyLatent, 0], pidCfg, pidSteps, 0, 10000, | ||
| g.UserInput.Get(T2IParamTypes.Seed) + 2, false, true, defsampler: "lcm", defscheduler: "simple", explicitSampler: pidSampler, explicitScheduler: pidScheduler, sectionId: T2IParamInput.SectionID_PixelDecoder); | ||
| g.CurrentMedia = g.CurrentMedia.WithPath([pidSampled, 0], WGNodeData.DT_LATENT_IMAGE, pidModel.ModelClass?.CompatClass); | ||
| g.CurrentMedia.Width = pidWidth; |
There was a problem hiding this comment.
for the Refiner Upscale, since target size is user-specified, follow user specified size by way of doing a post-rescale in pixel space, see how ImageUpscaleWithModel does it above
| bool isHiDreamO1Lora(JObject h) => hasLoraKey(h, "final_layer2.linear") && hasLoraKey(h, "language_model.layers.0.self_attn.q_proj"); | ||
| bool isChroma(JObject h) => h.ContainsKey("distilled_guidance_layer.in_proj.bias") && h.ContainsKey("double_blocks.0.img_attn.proj.bias"); | ||
| bool isChromaRadiance(JObject h) => h.ContainsKey("nerf_image_embedder.embedder.0.bias"); | ||
| bool isPiD(JObject h) => h.ContainsKey("net.lq_proj.latent_proj.0.weight"); |
There was a problem hiding this comment.
could you pick another key or two each just to narrow it? The list is getting long enough that we're getting occasional surprise overlaps.
There was a problem hiding this comment.
Added net.pixel_blocks.0.attn.q_norm.weight for isPid() and core.pixel_blocks.0.attn.q_norm.weight for isPixelDiT(). I figure keys with pixel_ in them aren't very common (yet). Clearing metadata is clean.

Depends on Comfy-Org/ComfyUI#14103
Not included: docs updates.
PixelDiT is an image model. Not that great.
The interesting part of this PR is the PiD, a 4x-locked upscaler that now replaces the refiner stage's upscaler. The upscale happens after the refiner's SwarmKSampler node.
PixelDiT workflow:

PiD upscale workflow:
