Skip to content

feat(render): [3/3] sensor pipeline, per-camera RT toggle, ini-first quality tiers#9702

Draft
JArmandoAnaya wants to merge 9 commits intocarla-simulator:ue5-devfrom
JArmandoAnaya:feat/render-pipeline-and-tiers
Draft

feat(render): [3/3] sensor pipeline, per-camera RT toggle, ini-first quality tiers#9702
JArmandoAnaya wants to merge 9 commits intocarla-simulator:ue5-devfrom
JArmandoAnaya:feat/render-pipeline-and-tiers

Conversation

@JArmandoAnaya
Copy link
Copy Markdown
Contributor

@JArmandoAnaya JArmandoAnaya commented Apr 28, 2026

PR blocked on #9698

Description

This PR is the third and final of a three-PR series aimed at making CARLA on UE5 (0.10.0) usable on graphics cards smaller than the 16 GB announced for the release, without sacrificing the visual quality the project currently ships. Across the series, the goal is bug fixes, refactoring around Unreal Engine resource ownership, and a redesigned scalability tier system that gives users explicit Low / Medium / High / Epic presets with predictable VRAM envelopes.

After the full series, on a 16 GB card in Epic mode the simulator runs at roughly 80 % GPU compute utilisation and averages ~9.5 GB VRAM steady-state with stable frame pacing, with no sudden spikes that previously crashed the process even on 16 GB cards. Initial testing also suggests that an 8 GB card may be able to run the simulator (likely with a perceptible quality drop in Epic mode), though that is not a guarantee yet and needs broader hardware coverage. Demo run with the full series applied: https://www.youtube.com/watch?v=Z9U6aGwcoBo

carla-0 10 0-ue5-refactoring
PR series overview
# PR Theme Status
1/3 fix/preexisting-bugs Pre-existing UE4-era latent bugs In review (#9697)
2/3 refactor/uobject-hardening UE5 UObject hardening: TObjectPtr migration, factory mesh caches, async heightmap on tile cross, soft-reference catalogs In review (#9698)
3/3 feat/render-pipeline-and-tiers (this PR) Sensor pipeline optimisation, per-camera RT toggle + selective temporal-history opt-in, ini-first quality tier system In review (stacked on PR 2/3)

This PR (3/3) closes the chain by combining the sensor-side rendering work and the per-tier scalability rewrite into a single review unit. The sensor pipeline changes (commit 1) and the per-camera RT toggle (commit 3) both depend on the per-tier CVar plumbing introduced in the scalability rewrite (commit 2); shipping them as one PR keeps the rendering surface coherent and avoids a transient quality regression between intermediate commits.

What this PR (3/3) changes

The PR is organised in three commits to make review easier; each commit is independently buildable and the LibCarla suite passes after each one.

Commit 1 — perf(sensors): RHI readback pool, lazy GBuffer capture, streaming prewarm
  • RHIGPUReadbackPool.{h,cpp} (new) — a small ref-counted pool of FRHIGPUTextureReadback objects keyed by (width, height, EPixelFormat). Replaces the per-frame allocation that every camera was doing in ImageUtil::ReadSensorImageDataAsync*. Reused readbacks recycle their staging memory across frames; the pool is bounded and falls back to a fresh allocation if the cached entry is still busy on the RHI thread.
  • Sensor/ImageUtil.{h,cpp} — route every async readback path (ReadSensorImageDataAsyncFColor, ReadSensorImageDataAsyncFLinearColor) through the pool. The lambda body that previously default-constructed an FRHIGPUTextureReadback now pulls one from the pool and returns it on completion.
  • Sensor/PixelReader.{h,cpp} — same routing for the synchronous read path used by GBuffer captures. The legacy fence-flush path is preserved behind a CVar (carla.Sensor.UseLegacyFenceFlush, default 0) so a deployment that needs the old behaviour can roll back without recompiling.
  • Sensor/SceneCaptureSensor.{h,cpp} — gate every GBuffer slot capture behind a "is anyone listening?" check (IsAnyGBufferClientListening). Previously the scene-capture pipeline rendered all 13 GBuffer textures on every frame regardless of whether a client had subscribed; now slot N is captured only when at least one client has called listen_to_gbuffer(N). Also adds the carla.Camera.ForceAllGBuffers console variable to force the legacy "always capture all" behaviour for debugging.
  • Sensor/SceneCaptureCamera.cpp / Sensor/OpticalFlowCamera.cpp — small streaming-prewarm pass: trigger a one-shot ForceLoadAllStreamableAssets at the first tick to amortise the level-load streaming storm on BeginPlay, instead of letting it bleed across the first ~30 frames.
  • Settings/CarlaSettings.cpp + Settings/CarlaSettingsDelegate.{h,cpp} — add the helper accessors used by the new render-side gates above.
Commit 2 — refactor(scalability): ini-first quality tier system via DeviceProfiles, fix Xid 109 on High
  • Config/DefaultDeviceProfiles.ini — one CarlaQuality_<Tier> profile per quality level (Low / Medium / High / Epic). Each profile contains the tier's non-Scalability CVars and the sg.*Quality selectors that pick which DefaultScalability.ini buckets fire.
  • Config/DefaultScalability.ini — bucket-section coverage for the four tiers. ECVF_Scalability CVars live here; non-Scalability per-tier CVars live in the DeviceProfile.
  • Plugins/CarlaDeviceProfileSelector/ (new project module) — runtime module that reads -quality-level=<Tier> from FCommandLine at engine init (PostConfigInit) and returns the matching CarlaQuality_<Tier> profile name. The profile's CVars apply before any compute dispatch, which is the key change vs the old runtime burst.
  • Settings/CarlaSettingsDelegate.cppLaunchLowQualityCommands / LaunchMediumQualityCommands / LaunchHighQualityCommands / LaunchEpicQualityCommands now have empty bodies. The per-tier CVar burst that used to sit there is gone — it was triggering NVIDIA Xid 109 GR CTX SWITCH TIMEOUT on Blackwell + recent NVIDIA OKM regardless of the burst's content. Flipping Lumen / RT / pool / streaming CVars after RHI is up provoked a compute dispatch the GPU hardware scheduler kicked. Engine-init application via DefaultDeviceProfiles.ini does not trigger the fault.
  • Settings/CarlaSettingsDelegate.cpp::ApplyPerActorQualitySettings — flatten the prior nested actor-list traversal into a single pass.
  • Settings/QualityLevelUE.h + LibCarla rpc::QualityLevel — add Medium and High tier values so the wire encoding stays consistent across the four-tier ladder.
Commit 3 — feat(sensors): per-camera ray-tracing toggle + selective temporal-history opt-in
  • Sensor/SceneCaptureSensor.{h,cpp} — add bUseRayTracing UPROPERTY (default true, opt-out), SetUseRayTracing / GetUseRayTracing UFUNCTIONs, and a private ApplyRayTracingSetting helper that writes CaptureComponent2D->bUseRayTracingIfEnabled from the per-sensor attribute or the global CVar override.
  • carla.Camera.UseRayTracing console variable: -1 respect the per-sensor attribute (default), 0 force off across every camera, 1 force on across every camera.
  • Actor/ActorBlueprintFunctionLibrary.cppuse_ray_tracing blueprint attribute (default "true") wired through MakeCameraDefinition, MakeNormalsCameraDefinition, and the SetCamera dispatch. Power users can spawn a camera with use_ray_tracing=false from Python to skip the ~700 MiB-1 GiB VRAM cost of per-camera HW-RT for sensors that do not need it (depth, semantic, lidar).
  • Selective temporal-history pattern — base ASceneCaptureSensor ctor now sets bAlwaysPersistRenderingState = false, saving ~150-300 MiB per camera by dropping per-frame Lumen / TSR history. ASceneCaptureCamera (the RGB sensor used by manual_control.py and most consumers) opts back in to true via its constructor, preserving temporal AA quality on the RGB output. Non-RGB sensors (depth, semantic, normals, instance, optical flow, DVS) drop temporal history without a visible quality cost since their pipelines do not consume the TSR state.

Where has this been tested?

  • Platform(s): Ubuntu 22.04.
  • Python version(s): 3.10.
  • Unreal Engine version(s): UE 5.5.
  • Tested on: Nvidia GeForce RTX 5070 TI 16GB, Nvidia GeForce RTX 4070 12GB.

Possible Drawbacks

  • The RHIGPUReadbackPool recycles staging memory across frames. If a deployment was relying on the per-frame allocate / free pattern as an implicit cache invalidation (it shouldn't, but worth noting), the legacy path is available behind carla.Sensor.UseLegacyFenceFlush 1.
  • The lazy GBuffer capture (IsAnyGBufferClientListening gate) changes the GPU cost from "always 13 captures per frame" to "N captures per frame where N is the number of subscribed slots". For a deployment that subscribes all 13 slots the steady-state cost is identical; for the typical case (0-1 slots) the saving is significant. carla.Camera.ForceAllGBuffers 1 restores the previous behaviour.
  • The DefaultDeviceProfiles.ini migration changes when per-tier CVars apply: from runtime (after BeginPlay) to engine init (before any compute dispatch). Any deployment that was relying on the runtime burst order to override a project-wide CVar will need to move that override into the matching CarlaQuality_<Tier> profile.
  • The empty LaunchHighQualityCommands body is a behavioural change: on -quality-level=High, the prior runtime burst fired ~12 GEngine->Exec(...) lines after RHI init. Those lines now live in [CarlaQuality_High DeviceProfile] and apply at engine init instead. Functionally equivalent, but the CVar trace shows SetByDeviceProfile instead of SetByConsole. Root cause for the move: NVIDIA Xid 109 GR CTX SWITCH TIMEOUT on Blackwell + recent NVIDIA OKM was triggered by the runtime burst itself.
  • The selective temporal-history pattern (bAlwaysPersistRenderingState = false on the base, true on the RGB subclass) drops Lumen / TSR history on non-RGB sensors. No visible quality cost is expected (those pipelines do not consume TSR state), but a deployment that adds a new scene-capture subclass and observes flicker can opt back in by setting Capture->bAlwaysPersistRenderingState = true in the subclass ctor, mirroring ASceneCaptureCamera.
  • The per-camera ray-tracing toggle defaults to true (opt-out) so default users match upstream behaviour out of the box. Setting use_ray_tracing=false on a camera spawn (or carla.Camera.UseRayTracing 0 globally) is the explicit opt-out path; the default behaviour is unchanged from ue5-dev.

This change is Reviewable

Bundles together independent UE4-era latent bugs that surfaced during
the UE5 stability work. Each fix is small, self-contained, and verified
under the LibCarla GoogleTest suite plus a clean package build.

LibCarla
- LidarData / SemanticLidarData: relax the boundary check in ResetMemory
  from strict greater-than to greater-or-equal so callers supplying one
  slot per channel do not trip the Debug assertion. New
  test_lidar_data.cpp pins both equality and under-one-per-channel paths
  (Debug only; DEBUG_ASSERT is a no-op under NDEBUG). Server suite goes
  from 44 to 48 tests, all passing.

Carla plugin (UE5)
- Sensor/DVSCamera.cpp: PostPhysTick early-out had the IsValid(this)
  check inverted, returning on the live path and ticking the dead path.
  Flip the condition and gate on AreClientsListening() to match every
  other camera.
- Sensor/SceneCaptureCamera.cpp: gate the per-frame
  ENQUEUE_RENDER_COMMAND(MeasureTime) behind STATS or CSV_PROFILER. In
  Shipping builds the command runs every frame for every RGB camera
  with no observable output and no measurement sink.
- Sensor/UE4_Overridden/SceneCaptureComponent2D_CARLA.h: wrap ViewActor
  in UPROPERTY() and TObjectPtr so the captured actor pointer is
  GC-visible. Currently safe because the field is always assigned this,
  but the raw pointer is a latent foot-gun.
- Sensor/ImageUtil.cpp: ReadImageData was building a populated
  PixelData and then returning a default-constructed null TUniquePtr,
  throwing the work away. Return the populated pointer.
- Sensor/ShaderBasedSensor.cpp: AddPostProcessingMaterial used
  ConstructorHelpers::FObjectFinder, which is documented as
  constructor-only and has undefined behavior at runtime. Switch to
  FSoftObjectPath::TryLoad, which is the runtime-safe path resolver
  and accepts both plain object paths and the class-prefixed
  export-text form callers were already passing in. Adds an explicit
  Error log so a bad catalog path is observable in PackageLog instead
  of silently swallowed.

Examples
- PythonAPI/examples/manual_control.py: rapid camera toggles segfaulted
  the Python client. Bare sensor.destroy() races the streaming session
  while frames are still in flight. Call sensor.stop() and sleep
  briefly before destroy() so the listener detaches and queued
  callbacks drain. Same idiom already used in visualize_depth.py.

Build hygiene
- .gitignore: ignore Build-Tests/ output directory used by
  -DBUILD_LIBCARLA_TESTS=ON when the alternate Debug test build tree
  is configured.

Verification
- cmake --preset Release -DENABLE_ROS2=ON -DBUILD_LIBCARLA_TESTS=ON
  -DCARLA_UNREAL_PACKAGE_NO_COMPRESSION=ON, cold rebuild from scratch:
  carla-client, carla-python-api, carla-server,
  libcarla_test_server, libcarla_test_client, package all green.
- libcarla_test_server: 48/48 PASSED.
- libcarla_test_client: 62/62 PASSED.
- Total 110/110 across 25 test suites.
- Manual smoke: rapid camera toggles in manual_control.py no longer
  segfault on teardown.
…sync heightmap, soft-ref catalog

Fold the UObject hardening work into a single 36-file PR. Pure
storage-shape and lifecycle changes on UE5 reflection types; implicit
conversions cover existing callsites. Pairs with the upcoming sensor
pipeline (PR 3/4) and scalability ini-first (PR 4/4) refactors.

UPROPERTY raw-ptr migration:
- Wrap UPROPERTY raw pointers in TObjectPtr<> across MapGen, OpenDrive,
  Traffic, Trigger, Util, Vegetation, Vehicle, Sensor and Actor headers.
- Touch only the storage-shape; behavior is preserved by implicit
  conversion. SceneCaptureSensor.h gets the wraps for CaptureRenderTarget
  and CaptureComponent2D so PR 3/4 can layer the readback pool and
  per-camera ray-tracing toggle on top.
- ShaderBasedSensor.h wraps MaterialsFound; ActorDefinition.h wraps
  FVehicleActorDefinition::mesh.

Factory mesh caches:
- PropActorFactory and StaticMeshFactory cache LoadObject<UStaticMesh>
  results keyed by the soft-object path. The cache is seeded eagerly at
  GetDefinitions() time and falls back to a synchronous LoadObject on
  cache miss with a warning log, so per-spawn loads are eliminated on
  the hot path while preserving correctness.

Async heightmap on tile cross:
- CustomTerrainPhysicsComponent now loads UHeightMapDataAsset via
  FStreamableManager (UAssetManager::GetStreamableManager()) on tile
  crossings instead of issuing a blocking StaticLoadObject on the game
  thread. Gated by carla.Terrain.AsyncHeightmapLoad (default 1); set to
  0 to force the legacy synchronous path for rollback. The pending
  TSharedPtr<FStreamableHandle> cannot be a UPROPERTY, so it is held as
  a plain member with explicit cancel-on-EndPlay.

Catalog soft-ref storage:
- PropParameters.Mesh becomes TSoftObjectPtr<UStaticMesh>;
  VehicleParameters.Class and PedestrianParameters.Class become
  TSoftClassPtr<ACarlaWheeledVehicle> / TSoftClassPtr<ACharacter>.
  CarlaBlueprintRegistry, VehicleActorFactory, and WalkerActorFactory
  resolve the soft references via LoadSynchronous() at definition-build
  time, so the JSON parse no longer force-loads every catalog entry.
  ActorBlueprintFunctionLibrary mirrors the change in
  MakeVehicleDefinition / MakePedestrianDefinition / MakePropDefinition.

Validation:
- Package builds clean with -DCARLA_UNREAL_PACKAGE_NO_COMPRESSION=ON.
- LibCarla unit tests pass: 48/48 server, 62/62 client.
- Stacked on carla-simulator#9697 (PR 1/4 of the upstream chain).
…es, fix Xid 109 on High

Move every per-tier rendering CVar from the runtime CarlaSettingsDelegate
burst to engine-init DeviceProfile selection. The runtime burst itself was
triggering NVIDIA Xid 109 GR CTX SWITCH TIMEOUT on Blackwell + recent NVIDIA
OKM, regardless of the burst's content; flipping Lumen / RT / pool /
streaming CVars after RHI is up provoked a compute dispatch the GPU hardware
scheduler kicked.

- DefaultDeviceProfiles.ini carries one CarlaQuality_<Tier> profile per
  quality level (Low/Medium/High/Epic). Each profile contains the tier's
  non-Scalability CVars and the sg.*Quality selectors that pick which
  DefaultScalability.ini buckets fire.
- CarlaDeviceProfileSelector runtime module reads -quality-level=<Tier> from
  FCommandLine and returns the matching CarlaQuality_<Tier> profile name.
  Runs at engine init (PostConfigInit) so the profile's CVars apply before
  any compute dispatch.
- LaunchMediumQualityCommands / LaunchHighQualityCommands ship with empty
  bodies; the per-tier CVar burst is gone.
- ApplyPerActorQualitySettings replaces the prior nested traversal with a
  single pass over the actor list.
- Medium and High tier values added to QualityLevelUE.h and the LibCarla
  rpc::QualityLevel enum so the wire encoding stays consistent.
…tory opt-in

Per-camera hardware ray-tracing toggle. Defaults to true so default users
match upstream behaviour; setting use_ray_tracing=false on a camera spawn
(or carla.Camera.UseRayTracing 0 globally) skips the ~700 MiB-1 GiB VRAM
cost of per-camera HW-RT. Useful for sensors that do not need RT (depth,
semantic, lidar).

- bUseRayTracing UPROPERTY on ASceneCaptureSensor (default true).
- SetUseRayTracing / GetUseRayTracing UFUNCTIONs.
- ApplyRayTracingSetting writes CaptureComponent2D->bUseRayTracingIfEnabled
  from the per-sensor attribute or the global CVar override.
- carla.Camera.UseRayTracing console variable: -1 respect attribute (default),
  0 force off, 1 force on.
- use_ray_tracing blueprint attribute (default "true") wired through
  ActorBlueprintFunctionLibrary::MakeCameraDefinition,
  MakeNormalsCameraDefinition, and the SetCamera dispatch.

Selective temporal-history pattern: base ASceneCaptureSensor ctor now sets
bAlwaysPersistRenderingState = false, saving 150-300 MiB per camera by
dropping per-frame Lumen/TSR history. ASceneCaptureCamera (the RGB sensor
used by manual_control and most consumers) opts back in to true via its
constructor, preserving temporal AA quality on the RGB output. Non-RGB
sensors (depth, semantic, normals, instance, optical flow, DVS) drop
temporal history without a visible quality cost since their pipelines do
not consume the TSR state.
@JArmandoAnaya JArmandoAnaya force-pushed the feat/render-pipeline-and-tiers branch from 73e99e0 to 939080d Compare May 5, 2026 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant