feat(render): [3/3] sensor pipeline, per-camera RT toggle, ini-first quality tiers by JArmandoAnaya · Pull Request #9702 · carla-simulator/carla

JArmandoAnaya · 2026-04-28T21:23:13Z

PR blocked on #9698

Description

This PR is the third and final of a three-PR series aimed at making CARLA on UE5 (0.10.0) usable on graphics cards smaller than the 16 GB announced for the release, without sacrificing the visual quality the project currently ships. Across the series, the goal is bug fixes, refactoring around Unreal Engine resource ownership, and a redesigned scalability tier system that gives users explicit Low / Medium / High / Epic presets with predictable VRAM envelopes.

After the full series, on a 16 GB card in Epic mode the simulator runs at roughly 80 % GPU compute utilisation and averages ~9.5 GB VRAM steady-state with stable frame pacing, with no sudden spikes that previously crashed the process even on 16 GB cards. Initial testing also suggests that an 8 GB card may be able to run the simulator (likely with a perceptible quality drop in Epic mode), though that is not a guarantee yet and needs broader hardware coverage. Demo run with the full series applied: https://www.youtube.com/watch?v=Z9U6aGwcoBo

PR series overview

#	PR	Theme	Status
1/3	`fix/preexisting-bugs`	Pre-existing UE4-era latent bugs	In review (#9697)
2/3	`refactor/uobject-hardening`	UE5 UObject hardening: `TObjectPtr` migration, factory mesh caches, async heightmap on tile cross, soft-reference catalogs	In review (#9698)
3/3	`feat/render-pipeline-and-tiers` (this PR)	Sensor pipeline optimisation, per-camera RT toggle + selective temporal-history opt-in, ini-first quality tier system	In review (stacked on PR 2/3)

This PR (3/3) closes the chain by combining the sensor-side rendering work and the per-tier scalability rewrite into a single review unit. The sensor pipeline changes (commit 1) and the per-camera RT toggle (commit 3) both depend on the per-tier CVar plumbing introduced in the scalability rewrite (commit 2); shipping them as one PR keeps the rendering surface coherent and avoids a transient quality regression between intermediate commits.

What this PR (3/3) changes

The PR is organised in three commits to make review easier; each commit is independently buildable and the LibCarla suite passes after each one.

Commit 1 — `perf(sensors): RHI readback pool, lazy GBuffer capture, streaming prewarm`

RHIGPUReadbackPool.{h,cpp} (new) — a small ref-counted pool of FRHIGPUTextureReadback objects keyed by (width, height, EPixelFormat). Replaces the per-frame allocation that every camera was doing in ImageUtil::ReadSensorImageDataAsync*. Reused readbacks recycle their staging memory across frames; the pool is bounded and falls back to a fresh allocation if the cached entry is still busy on the RHI thread.
Sensor/ImageUtil.{h,cpp} — route every async readback path (ReadSensorImageDataAsyncFColor, ReadSensorImageDataAsyncFLinearColor) through the pool. The lambda body that previously default-constructed an FRHIGPUTextureReadback now pulls one from the pool and returns it on completion.
Sensor/PixelReader.{h,cpp} — same routing for the synchronous read path used by GBuffer captures. The legacy fence-flush path is preserved behind a CVar (carla.Sensor.UseLegacyFenceFlush, default 0) so a deployment that needs the old behaviour can roll back without recompiling.
Sensor/SceneCaptureSensor.{h,cpp} — gate every GBuffer slot capture behind a "is anyone listening?" check (IsAnyGBufferClientListening). Previously the scene-capture pipeline rendered all 13 GBuffer textures on every frame regardless of whether a client had subscribed; now slot N is captured only when at least one client has called listen_to_gbuffer(N). Also adds the carla.Camera.ForceAllGBuffers console variable to force the legacy "always capture all" behaviour for debugging.
Sensor/SceneCaptureCamera.cpp / Sensor/OpticalFlowCamera.cpp — small streaming-prewarm pass: trigger a one-shot ForceLoadAllStreamableAssets at the first tick to amortise the level-load streaming storm on BeginPlay, instead of letting it bleed across the first ~30 frames.
Settings/CarlaSettings.cpp + Settings/CarlaSettingsDelegate.{h,cpp} — add the helper accessors used by the new render-side gates above.

Commit 2 — `refactor(scalability): ini-first quality tier system via DeviceProfiles, fix Xid 109 on High`

Config/DefaultDeviceProfiles.ini — one CarlaQuality_<Tier> profile per quality level (Low / Medium / High / Epic). Each profile contains the tier's non-Scalability CVars and the sg.*Quality selectors that pick which DefaultScalability.ini buckets fire.
Config/DefaultScalability.ini — bucket-section coverage for the four tiers. ECVF_Scalability CVars live here; non-Scalability per-tier CVars live in the DeviceProfile.
Plugins/CarlaDeviceProfileSelector/ (new project module) — runtime module that reads -quality-level=<Tier> from FCommandLine at engine init (PostConfigInit) and returns the matching CarlaQuality_<Tier> profile name. The profile's CVars apply before any compute dispatch, which is the key change vs the old runtime burst.
Settings/CarlaSettingsDelegate.cpp — LaunchLowQualityCommands / LaunchMediumQualityCommands / LaunchHighQualityCommands / LaunchEpicQualityCommands now have empty bodies. The per-tier CVar burst that used to sit there is gone — it was triggering NVIDIA Xid 109 GR CTX SWITCH TIMEOUT on Blackwell + recent NVIDIA OKM regardless of the burst's content. Flipping Lumen / RT / pool / streaming CVars after RHI is up provoked a compute dispatch the GPU hardware scheduler kicked. Engine-init application via DefaultDeviceProfiles.ini does not trigger the fault.
Settings/CarlaSettingsDelegate.cpp::ApplyPerActorQualitySettings — flatten the prior nested actor-list traversal into a single pass.
Settings/QualityLevelUE.h + LibCarla rpc::QualityLevel — add Medium and High tier values so the wire encoding stays consistent across the four-tier ladder.

Commit 3 — `feat(sensors): per-camera ray-tracing toggle + selective temporal-history opt-in`

Sensor/SceneCaptureSensor.{h,cpp} — add bUseRayTracing UPROPERTY (default true, opt-out), SetUseRayTracing / GetUseRayTracing UFUNCTIONs, and a private ApplyRayTracingSetting helper that writes CaptureComponent2D->bUseRayTracingIfEnabled from the per-sensor attribute or the global CVar override.
carla.Camera.UseRayTracing console variable: -1 respect the per-sensor attribute (default), 0 force off across every camera, 1 force on across every camera.
Actor/ActorBlueprintFunctionLibrary.cpp — use_ray_tracing blueprint attribute (default "true") wired through MakeCameraDefinition, MakeNormalsCameraDefinition, and the SetCamera dispatch. Power users can spawn a camera with use_ray_tracing=false from Python to skip the ~700 MiB-1 GiB VRAM cost of per-camera HW-RT for sensors that do not need it (depth, semantic, lidar).
Selective temporal-history pattern — base ASceneCaptureSensor ctor now sets bAlwaysPersistRenderingState = false, saving ~150-300 MiB per camera by dropping per-frame Lumen / TSR history. ASceneCaptureCamera (the RGB sensor used by manual_control.py and most consumers) opts back in to true via its constructor, preserving temporal AA quality on the RGB output. Non-RGB sensors (depth, semantic, normals, instance, optical flow, DVS) drop temporal history without a visible quality cost since their pipelines do not consume the TSR state.

Where has this been tested?

Platform(s): Ubuntu 22.04.
Python version(s): 3.10.
Unreal Engine version(s): UE 5.5.
Tested on: Nvidia GeForce RTX 5070 TI 16GB, Nvidia GeForce RTX 4070 12GB.

Possible Drawbacks

The RHIGPUReadbackPool recycles staging memory across frames. If a deployment was relying on the per-frame allocate / free pattern as an implicit cache invalidation (it shouldn't, but worth noting), the legacy path is available behind carla.Sensor.UseLegacyFenceFlush 1.
The lazy GBuffer capture (IsAnyGBufferClientListening gate) changes the GPU cost from "always 13 captures per frame" to "N captures per frame where N is the number of subscribed slots". For a deployment that subscribes all 13 slots the steady-state cost is identical; for the typical case (0-1 slots) the saving is significant. carla.Camera.ForceAllGBuffers 1 restores the previous behaviour.
The DefaultDeviceProfiles.ini migration changes when per-tier CVars apply: from runtime (after BeginPlay) to engine init (before any compute dispatch). Any deployment that was relying on the runtime burst order to override a project-wide CVar will need to move that override into the matching CarlaQuality_<Tier> profile.
The empty LaunchHighQualityCommands body is a behavioural change: on -quality-level=High, the prior runtime burst fired ~12 GEngine->Exec(...) lines after RHI init. Those lines now live in [CarlaQuality_High DeviceProfile] and apply at engine init instead. Functionally equivalent, but the CVar trace shows SetByDeviceProfile instead of SetByConsole. Root cause for the move: NVIDIA Xid 109 GR CTX SWITCH TIMEOUT on Blackwell + recent NVIDIA OKM was triggered by the runtime burst itself.
The selective temporal-history pattern (bAlwaysPersistRenderingState = false on the base, true on the RGB subclass) drops Lumen / TSR history on non-RGB sensors. No visible quality cost is expected (those pipelines do not consume TSR state), but a deployment that adds a new scene-capture subclass and observes flicker can opt back in by setting Capture->bAlwaysPersistRenderingState = true in the subclass ctor, mirroring ASceneCaptureCamera.
The per-camera ray-tracing toggle defaults to true (opt-out) so default users match upstream behaviour out of the box. Setting use_ray_tracing=false on a camera spawn (or carla.Camera.UseRayTracing 0 globally) is the explicit opt-out path; the default behaviour is unchanged from ue5-dev.

This change is

Bundles together independent UE4-era latent bugs that surfaced during the UE5 stability work. Each fix is small, self-contained, and verified under the LibCarla GoogleTest suite plus a clean package build. LibCarla - LidarData / SemanticLidarData: relax the boundary check in ResetMemory from strict greater-than to greater-or-equal so callers supplying one slot per channel do not trip the Debug assertion. New test_lidar_data.cpp pins both equality and under-one-per-channel paths (Debug only; DEBUG_ASSERT is a no-op under NDEBUG). Server suite goes from 44 to 48 tests, all passing. Carla plugin (UE5) - Sensor/DVSCamera.cpp: PostPhysTick early-out had the IsValid(this) check inverted, returning on the live path and ticking the dead path. Flip the condition and gate on AreClientsListening() to match every other camera. - Sensor/SceneCaptureCamera.cpp: gate the per-frame ENQUEUE_RENDER_COMMAND(MeasureTime) behind STATS or CSV_PROFILER. In Shipping builds the command runs every frame for every RGB camera with no observable output and no measurement sink. - Sensor/UE4_Overridden/SceneCaptureComponent2D_CARLA.h: wrap ViewActor in UPROPERTY() and TObjectPtr so the captured actor pointer is GC-visible. Currently safe because the field is always assigned this, but the raw pointer is a latent foot-gun. - Sensor/ImageUtil.cpp: ReadImageData was building a populated PixelData and then returning a default-constructed null TUniquePtr, throwing the work away. Return the populated pointer. - Sensor/ShaderBasedSensor.cpp: AddPostProcessingMaterial used ConstructorHelpers::FObjectFinder, which is documented as constructor-only and has undefined behavior at runtime. Switch to FSoftObjectPath::TryLoad, which is the runtime-safe path resolver and accepts both plain object paths and the class-prefixed export-text form callers were already passing in. Adds an explicit Error log so a bad catalog path is observable in PackageLog instead of silently swallowed. Examples - PythonAPI/examples/manual_control.py: rapid camera toggles segfaulted the Python client. Bare sensor.destroy() races the streaming session while frames are still in flight. Call sensor.stop() and sleep briefly before destroy() so the listener detaches and queued callbacks drain. Same idiom already used in visualize_depth.py. Build hygiene - .gitignore: ignore Build-Tests/ output directory used by -DBUILD_LIBCARLA_TESTS=ON when the alternate Debug test build tree is configured. Verification - cmake --preset Release -DENABLE_ROS2=ON -DBUILD_LIBCARLA_TESTS=ON -DCARLA_UNREAL_PACKAGE_NO_COMPRESSION=ON, cold rebuild from scratch: carla-client, carla-python-api, carla-server, libcarla_test_server, libcarla_test_client, package all green. - libcarla_test_server: 48/48 PASSED. - libcarla_test_client: 62/62 PASSED. - Total 110/110 across 25 test suites. - Manual smoke: rapid camera toggles in manual_control.py no longer segfault on teardown.

…sync heightmap, soft-ref catalog Fold the UObject hardening work into a single 36-file PR. Pure storage-shape and lifecycle changes on UE5 reflection types; implicit conversions cover existing callsites. Pairs with the upcoming sensor pipeline (PR 3/4) and scalability ini-first (PR 4/4) refactors. UPROPERTY raw-ptr migration: - Wrap UPROPERTY raw pointers in TObjectPtr<> across MapGen, OpenDrive, Traffic, Trigger, Util, Vegetation, Vehicle, Sensor and Actor headers. - Touch only the storage-shape; behavior is preserved by implicit conversion. SceneCaptureSensor.h gets the wraps for CaptureRenderTarget and CaptureComponent2D so PR 3/4 can layer the readback pool and per-camera ray-tracing toggle on top. - ShaderBasedSensor.h wraps MaterialsFound; ActorDefinition.h wraps FVehicleActorDefinition::mesh. Factory mesh caches: - PropActorFactory and StaticMeshFactory cache LoadObject<UStaticMesh> results keyed by the soft-object path. The cache is seeded eagerly at GetDefinitions() time and falls back to a synchronous LoadObject on cache miss with a warning log, so per-spawn loads are eliminated on the hot path while preserving correctness. Async heightmap on tile cross: - CustomTerrainPhysicsComponent now loads UHeightMapDataAsset via FStreamableManager (UAssetManager::GetStreamableManager()) on tile crossings instead of issuing a blocking StaticLoadObject on the game thread. Gated by carla.Terrain.AsyncHeightmapLoad (default 1); set to 0 to force the legacy synchronous path for rollback. The pending TSharedPtr<FStreamableHandle> cannot be a UPROPERTY, so it is held as a plain member with explicit cancel-on-EndPlay. Catalog soft-ref storage: - PropParameters.Mesh becomes TSoftObjectPtr<UStaticMesh>; VehicleParameters.Class and PedestrianParameters.Class become TSoftClassPtr<ACarlaWheeledVehicle> / TSoftClassPtr<ACharacter>. CarlaBlueprintRegistry, VehicleActorFactory, and WalkerActorFactory resolve the soft references via LoadSynchronous() at definition-build time, so the JSON parse no longer force-loads every catalog entry. ActorBlueprintFunctionLibrary mirrors the change in MakeVehicleDefinition / MakePedestrianDefinition / MakePropDefinition. Validation: - Package builds clean with -DCARLA_UNREAL_PACKAGE_NO_COMPRESSION=ON. - LibCarla unit tests pass: 48/48 server, 62/62 client. - Stacked on carla-simulator#9697 (PR 1/4 of the upstream chain).

…warm

…es, fix Xid 109 on High Move every per-tier rendering CVar from the runtime CarlaSettingsDelegate burst to engine-init DeviceProfile selection. The runtime burst itself was triggering NVIDIA Xid 109 GR CTX SWITCH TIMEOUT on Blackwell + recent NVIDIA OKM, regardless of the burst's content; flipping Lumen / RT / pool / streaming CVars after RHI is up provoked a compute dispatch the GPU hardware scheduler kicked. - DefaultDeviceProfiles.ini carries one CarlaQuality_<Tier> profile per quality level (Low/Medium/High/Epic). Each profile contains the tier's non-Scalability CVars and the sg.*Quality selectors that pick which DefaultScalability.ini buckets fire. - CarlaDeviceProfileSelector runtime module reads -quality-level=<Tier> from FCommandLine and returns the matching CarlaQuality_<Tier> profile name. Runs at engine init (PostConfigInit) so the profile's CVars apply before any compute dispatch. - LaunchMediumQualityCommands / LaunchHighQualityCommands ship with empty bodies; the per-tier CVar burst is gone. - ApplyPerActorQualitySettings replaces the prior nested traversal with a single pass over the actor list. - Medium and High tier values added to QualityLevelUE.h and the LibCarla rpc::QualityLevel enum so the wire encoding stays consistent.

…tory opt-in Per-camera hardware ray-tracing toggle. Defaults to true so default users match upstream behaviour; setting use_ray_tracing=false on a camera spawn (or carla.Camera.UseRayTracing 0 globally) skips the ~700 MiB-1 GiB VRAM cost of per-camera HW-RT. Useful for sensors that do not need RT (depth, semantic, lidar). - bUseRayTracing UPROPERTY on ASceneCaptureSensor (default true). - SetUseRayTracing / GetUseRayTracing UFUNCTIONs. - ApplyRayTracingSetting writes CaptureComponent2D->bUseRayTracingIfEnabled from the per-sensor attribute or the global CVar override. - carla.Camera.UseRayTracing console variable: -1 respect attribute (default), 0 force off, 1 force on. - use_ray_tracing blueprint attribute (default "true") wired through ActorBlueprintFunctionLibrary::MakeCameraDefinition, MakeNormalsCameraDefinition, and the SetCamera dispatch. Selective temporal-history pattern: base ASceneCaptureSensor ctor now sets bAlwaysPersistRenderingState = false, saving 150-300 MiB per camera by dropping per-frame Lumen/TSR history. ASceneCaptureCamera (the RGB sensor used by manual_control and most consumers) opts back in to true via its constructor, preserving temporal AA quality on the RGB output. Non-RGB sensors (depth, semantic, normals, instance, optical flow, DVS) drop temporal history without a visible quality cost since their pipelines do not consume the TSR state.

JArmandoAnaya added 8 commits April 27, 2026 23:32

docs(changelog): note pre-existing UE4-era bug fixes

4dbe18c

docs(changelog): note UObject hardening / TObjectPtr migration

390f02c

perf(sensors): RHI readback pool, lazy GBuffer capture, streaming pre…

6e67dcc

…warm

docs(changelog): note render pipeline + quality-tier rework

939080d

JArmandoAnaya force-pushed the feat/render-pipeline-and-tiers branch from 73e99e0 to 939080d Compare May 5, 2026 18:35

Merge branch 'ue5-dev' into feat/render-pipeline-and-tiers

e79b515

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(render): [3/3] sensor pipeline, per-camera RT toggle, ini-first quality tiers#9702

feat(render): [3/3] sensor pipeline, per-camera RT toggle, ini-first quality tiers#9702
JArmandoAnaya wants to merge 9 commits intocarla-simulator:ue5-devfrom
JArmandoAnaya:feat/render-pipeline-and-tiers

JArmandoAnaya commented Apr 28, 2026 •

edited by Blyron

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

JArmandoAnaya commented Apr 28, 2026 • edited by Blyron Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

PR series overview

What this PR (3/3) changes

Commit 1 — perf(sensors): RHI readback pool, lazy GBuffer capture, streaming prewarm

Commit 2 — refactor(scalability): ini-first quality tier system via DeviceProfiles, fix Xid 109 on High

Commit 3 — feat(sensors): per-camera ray-tracing toggle + selective temporal-history opt-in

Where has this been tested?

Possible Drawbacks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JArmandoAnaya commented Apr 28, 2026 •

edited by Blyron

Loading

Commit 1 — `perf(sensors): RHI readback pool, lazy GBuffer capture, streaming prewarm`

Commit 2 — `refactor(scalability): ini-first quality tier system via DeviceProfiles, fix Xid 109 on High`

Commit 3 — `feat(sensors): per-camera ray-tracing toggle + selective temporal-history opt-in`