feat(render): [3/3] sensor pipeline, per-camera RT toggle, ini-first quality tiers#9702
Draft
JArmandoAnaya wants to merge 9 commits intocarla-simulator:ue5-devfrom
Draft
Conversation
Bundles together independent UE4-era latent bugs that surfaced during the UE5 stability work. Each fix is small, self-contained, and verified under the LibCarla GoogleTest suite plus a clean package build. LibCarla - LidarData / SemanticLidarData: relax the boundary check in ResetMemory from strict greater-than to greater-or-equal so callers supplying one slot per channel do not trip the Debug assertion. New test_lidar_data.cpp pins both equality and under-one-per-channel paths (Debug only; DEBUG_ASSERT is a no-op under NDEBUG). Server suite goes from 44 to 48 tests, all passing. Carla plugin (UE5) - Sensor/DVSCamera.cpp: PostPhysTick early-out had the IsValid(this) check inverted, returning on the live path and ticking the dead path. Flip the condition and gate on AreClientsListening() to match every other camera. - Sensor/SceneCaptureCamera.cpp: gate the per-frame ENQUEUE_RENDER_COMMAND(MeasureTime) behind STATS or CSV_PROFILER. In Shipping builds the command runs every frame for every RGB camera with no observable output and no measurement sink. - Sensor/UE4_Overridden/SceneCaptureComponent2D_CARLA.h: wrap ViewActor in UPROPERTY() and TObjectPtr so the captured actor pointer is GC-visible. Currently safe because the field is always assigned this, but the raw pointer is a latent foot-gun. - Sensor/ImageUtil.cpp: ReadImageData was building a populated PixelData and then returning a default-constructed null TUniquePtr, throwing the work away. Return the populated pointer. - Sensor/ShaderBasedSensor.cpp: AddPostProcessingMaterial used ConstructorHelpers::FObjectFinder, which is documented as constructor-only and has undefined behavior at runtime. Switch to FSoftObjectPath::TryLoad, which is the runtime-safe path resolver and accepts both plain object paths and the class-prefixed export-text form callers were already passing in. Adds an explicit Error log so a bad catalog path is observable in PackageLog instead of silently swallowed. Examples - PythonAPI/examples/manual_control.py: rapid camera toggles segfaulted the Python client. Bare sensor.destroy() races the streaming session while frames are still in flight. Call sensor.stop() and sleep briefly before destroy() so the listener detaches and queued callbacks drain. Same idiom already used in visualize_depth.py. Build hygiene - .gitignore: ignore Build-Tests/ output directory used by -DBUILD_LIBCARLA_TESTS=ON when the alternate Debug test build tree is configured. Verification - cmake --preset Release -DENABLE_ROS2=ON -DBUILD_LIBCARLA_TESTS=ON -DCARLA_UNREAL_PACKAGE_NO_COMPRESSION=ON, cold rebuild from scratch: carla-client, carla-python-api, carla-server, libcarla_test_server, libcarla_test_client, package all green. - libcarla_test_server: 48/48 PASSED. - libcarla_test_client: 62/62 PASSED. - Total 110/110 across 25 test suites. - Manual smoke: rapid camera toggles in manual_control.py no longer segfault on teardown.
…sync heightmap, soft-ref catalog Fold the UObject hardening work into a single 36-file PR. Pure storage-shape and lifecycle changes on UE5 reflection types; implicit conversions cover existing callsites. Pairs with the upcoming sensor pipeline (PR 3/4) and scalability ini-first (PR 4/4) refactors. UPROPERTY raw-ptr migration: - Wrap UPROPERTY raw pointers in TObjectPtr<> across MapGen, OpenDrive, Traffic, Trigger, Util, Vegetation, Vehicle, Sensor and Actor headers. - Touch only the storage-shape; behavior is preserved by implicit conversion. SceneCaptureSensor.h gets the wraps for CaptureRenderTarget and CaptureComponent2D so PR 3/4 can layer the readback pool and per-camera ray-tracing toggle on top. - ShaderBasedSensor.h wraps MaterialsFound; ActorDefinition.h wraps FVehicleActorDefinition::mesh. Factory mesh caches: - PropActorFactory and StaticMeshFactory cache LoadObject<UStaticMesh> results keyed by the soft-object path. The cache is seeded eagerly at GetDefinitions() time and falls back to a synchronous LoadObject on cache miss with a warning log, so per-spawn loads are eliminated on the hot path while preserving correctness. Async heightmap on tile cross: - CustomTerrainPhysicsComponent now loads UHeightMapDataAsset via FStreamableManager (UAssetManager::GetStreamableManager()) on tile crossings instead of issuing a blocking StaticLoadObject on the game thread. Gated by carla.Terrain.AsyncHeightmapLoad (default 1); set to 0 to force the legacy synchronous path for rollback. The pending TSharedPtr<FStreamableHandle> cannot be a UPROPERTY, so it is held as a plain member with explicit cancel-on-EndPlay. Catalog soft-ref storage: - PropParameters.Mesh becomes TSoftObjectPtr<UStaticMesh>; VehicleParameters.Class and PedestrianParameters.Class become TSoftClassPtr<ACarlaWheeledVehicle> / TSoftClassPtr<ACharacter>. CarlaBlueprintRegistry, VehicleActorFactory, and WalkerActorFactory resolve the soft references via LoadSynchronous() at definition-build time, so the JSON parse no longer force-loads every catalog entry. ActorBlueprintFunctionLibrary mirrors the change in MakeVehicleDefinition / MakePedestrianDefinition / MakePropDefinition. Validation: - Package builds clean with -DCARLA_UNREAL_PACKAGE_NO_COMPRESSION=ON. - LibCarla unit tests pass: 48/48 server, 62/62 client. - Stacked on carla-simulator#9697 (PR 1/4 of the upstream chain).
…es, fix Xid 109 on High Move every per-tier rendering CVar from the runtime CarlaSettingsDelegate burst to engine-init DeviceProfile selection. The runtime burst itself was triggering NVIDIA Xid 109 GR CTX SWITCH TIMEOUT on Blackwell + recent NVIDIA OKM, regardless of the burst's content; flipping Lumen / RT / pool / streaming CVars after RHI is up provoked a compute dispatch the GPU hardware scheduler kicked. - DefaultDeviceProfiles.ini carries one CarlaQuality_<Tier> profile per quality level (Low/Medium/High/Epic). Each profile contains the tier's non-Scalability CVars and the sg.*Quality selectors that pick which DefaultScalability.ini buckets fire. - CarlaDeviceProfileSelector runtime module reads -quality-level=<Tier> from FCommandLine and returns the matching CarlaQuality_<Tier> profile name. Runs at engine init (PostConfigInit) so the profile's CVars apply before any compute dispatch. - LaunchMediumQualityCommands / LaunchHighQualityCommands ship with empty bodies; the per-tier CVar burst is gone. - ApplyPerActorQualitySettings replaces the prior nested traversal with a single pass over the actor list. - Medium and High tier values added to QualityLevelUE.h and the LibCarla rpc::QualityLevel enum so the wire encoding stays consistent.
…tory opt-in Per-camera hardware ray-tracing toggle. Defaults to true so default users match upstream behaviour; setting use_ray_tracing=false on a camera spawn (or carla.Camera.UseRayTracing 0 globally) skips the ~700 MiB-1 GiB VRAM cost of per-camera HW-RT. Useful for sensors that do not need RT (depth, semantic, lidar). - bUseRayTracing UPROPERTY on ASceneCaptureSensor (default true). - SetUseRayTracing / GetUseRayTracing UFUNCTIONs. - ApplyRayTracingSetting writes CaptureComponent2D->bUseRayTracingIfEnabled from the per-sensor attribute or the global CVar override. - carla.Camera.UseRayTracing console variable: -1 respect attribute (default), 0 force off, 1 force on. - use_ray_tracing blueprint attribute (default "true") wired through ActorBlueprintFunctionLibrary::MakeCameraDefinition, MakeNormalsCameraDefinition, and the SetCamera dispatch. Selective temporal-history pattern: base ASceneCaptureSensor ctor now sets bAlwaysPersistRenderingState = false, saving 150-300 MiB per camera by dropping per-frame Lumen/TSR history. ASceneCaptureCamera (the RGB sensor used by manual_control and most consumers) opts back in to true via its constructor, preserving temporal AA quality on the RGB output. Non-RGB sensors (depth, semantic, normals, instance, optical flow, DVS) drop temporal history without a visible quality cost since their pipelines do not consume the TSR state.
73e99e0 to
939080d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR blocked on #9698
Description
This PR is the third and final of a three-PR series aimed at making CARLA on UE5 (0.10.0) usable on graphics cards smaller than the 16 GB announced for the release, without sacrificing the visual quality the project currently ships. Across the series, the goal is bug fixes, refactoring around Unreal Engine resource ownership, and a redesigned scalability tier system that gives users explicit Low / Medium / High / Epic presets with predictable VRAM envelopes.
After the full series, on a 16 GB card in Epic mode the simulator runs at roughly 80 % GPU compute utilisation and averages ~9.5 GB VRAM steady-state with stable frame pacing, with no sudden spikes that previously crashed the process even on 16 GB cards. Initial testing also suggests that an 8 GB card may be able to run the simulator (likely with a perceptible quality drop in Epic mode), though that is not a guarantee yet and needs broader hardware coverage. Demo run with the full series applied: https://www.youtube.com/watch?v=Z9U6aGwcoBo
PR series overview
fix/preexisting-bugsrefactor/uobject-hardeningTObjectPtrmigration, factory mesh caches, async heightmap on tile cross, soft-reference catalogsfeat/render-pipeline-and-tiers(this PR)This PR (3/3) closes the chain by combining the sensor-side rendering work and the per-tier scalability rewrite into a single review unit. The sensor pipeline changes (commit 1) and the per-camera RT toggle (commit 3) both depend on the per-tier CVar plumbing introduced in the scalability rewrite (commit 2); shipping them as one PR keeps the rendering surface coherent and avoids a transient quality regression between intermediate commits.
What this PR (3/3) changes
The PR is organised in three commits to make review easier; each commit is independently buildable and the LibCarla suite passes after each one.
Commit 1 —
perf(sensors): RHI readback pool, lazy GBuffer capture, streaming prewarmRHIGPUReadbackPool.{h,cpp}(new) — a small ref-counted pool ofFRHIGPUTextureReadbackobjects keyed by(width, height, EPixelFormat). Replaces the per-frame allocation that every camera was doing inImageUtil::ReadSensorImageDataAsync*. Reused readbacks recycle their staging memory across frames; the pool is bounded and falls back to a fresh allocation if the cached entry is still busy on the RHI thread.Sensor/ImageUtil.{h,cpp}— route every async readback path (ReadSensorImageDataAsyncFColor,ReadSensorImageDataAsyncFLinearColor) through the pool. The lambda body that previously default-constructed anFRHIGPUTextureReadbacknow pulls one from the pool and returns it on completion.Sensor/PixelReader.{h,cpp}— same routing for the synchronous read path used by GBuffer captures. The legacy fence-flush path is preserved behind a CVar (carla.Sensor.UseLegacyFenceFlush, default0) so a deployment that needs the old behaviour can roll back without recompiling.Sensor/SceneCaptureSensor.{h,cpp}— gate every GBuffer slot capture behind a "is anyone listening?" check (IsAnyGBufferClientListening). Previously the scene-capture pipeline rendered all 13 GBuffer textures on every frame regardless of whether a client had subscribed; now slot N is captured only when at least one client has calledlisten_to_gbuffer(N). Also adds thecarla.Camera.ForceAllGBuffersconsole variable to force the legacy "always capture all" behaviour for debugging.Sensor/SceneCaptureCamera.cpp/Sensor/OpticalFlowCamera.cpp— small streaming-prewarm pass: trigger a one-shotForceLoadAllStreamableAssetsat the first tick to amortise the level-load streaming storm onBeginPlay, instead of letting it bleed across the first ~30 frames.Settings/CarlaSettings.cpp+Settings/CarlaSettingsDelegate.{h,cpp}— add the helper accessors used by the new render-side gates above.Commit 2 —
refactor(scalability): ini-first quality tier system via DeviceProfiles, fix Xid 109 on HighConfig/DefaultDeviceProfiles.ini— oneCarlaQuality_<Tier>profile per quality level (Low / Medium / High / Epic). Each profile contains the tier's non-Scalability CVars and thesg.*Qualityselectors that pick whichDefaultScalability.inibuckets fire.Config/DefaultScalability.ini— bucket-section coverage for the four tiers. ECVF_Scalability CVars live here; non-Scalability per-tier CVars live in the DeviceProfile.Plugins/CarlaDeviceProfileSelector/(new project module) — runtime module that reads-quality-level=<Tier>fromFCommandLineat engine init (PostConfigInit) and returns the matchingCarlaQuality_<Tier>profile name. The profile's CVars apply before any compute dispatch, which is the key change vs the old runtime burst.Settings/CarlaSettingsDelegate.cpp—LaunchLowQualityCommands/LaunchMediumQualityCommands/LaunchHighQualityCommands/LaunchEpicQualityCommandsnow have empty bodies. The per-tier CVar burst that used to sit there is gone — it was triggering NVIDIA Xid 109 GR CTX SWITCH TIMEOUT on Blackwell + recent NVIDIA OKM regardless of the burst's content. Flipping Lumen / RT / pool / streaming CVars after RHI is up provoked a compute dispatch the GPU hardware scheduler kicked. Engine-init application viaDefaultDeviceProfiles.inidoes not trigger the fault.Settings/CarlaSettingsDelegate.cpp::ApplyPerActorQualitySettings— flatten the prior nested actor-list traversal into a single pass.Settings/QualityLevelUE.h+LibCarla rpc::QualityLevel— addMediumandHightier values so the wire encoding stays consistent across the four-tier ladder.Commit 3 —
feat(sensors): per-camera ray-tracing toggle + selective temporal-history opt-inSensor/SceneCaptureSensor.{h,cpp}— addbUseRayTracingUPROPERTY(defaulttrue, opt-out),SetUseRayTracing/GetUseRayTracingUFUNCTIONs, and a privateApplyRayTracingSettinghelper that writesCaptureComponent2D->bUseRayTracingIfEnabledfrom the per-sensor attribute or the global CVar override.carla.Camera.UseRayTracingconsole variable:-1respect the per-sensor attribute (default),0force off across every camera,1force on across every camera.Actor/ActorBlueprintFunctionLibrary.cpp—use_ray_tracingblueprint attribute (default"true") wired throughMakeCameraDefinition,MakeNormalsCameraDefinition, and theSetCameradispatch. Power users can spawn a camera withuse_ray_tracing=falsefrom Python to skip the ~700 MiB-1 GiB VRAM cost of per-camera HW-RT for sensors that do not need it (depth, semantic, lidar).ASceneCaptureSensorctor now setsbAlwaysPersistRenderingState = false, saving ~150-300 MiB per camera by dropping per-frame Lumen / TSR history.ASceneCaptureCamera(the RGB sensor used bymanual_control.pyand most consumers) opts back in totruevia its constructor, preserving temporal AA quality on the RGB output. Non-RGB sensors (depth, semantic, normals, instance, optical flow, DVS) drop temporal history without a visible quality cost since their pipelines do not consume the TSR state.Where has this been tested?
Possible Drawbacks
RHIGPUReadbackPoolrecycles staging memory across frames. If a deployment was relying on the per-frame allocate / free pattern as an implicit cache invalidation (it shouldn't, but worth noting), the legacy path is available behindcarla.Sensor.UseLegacyFenceFlush 1.IsAnyGBufferClientListeninggate) changes the GPU cost from "always 13 captures per frame" to "N captures per frame where N is the number of subscribed slots". For a deployment that subscribes all 13 slots the steady-state cost is identical; for the typical case (0-1 slots) the saving is significant.carla.Camera.ForceAllGBuffers 1restores the previous behaviour.DefaultDeviceProfiles.inimigration changes when per-tier CVars apply: from runtime (afterBeginPlay) to engine init (before any compute dispatch). Any deployment that was relying on the runtime burst order to override a project-wide CVar will need to move that override into the matchingCarlaQuality_<Tier>profile.LaunchHighQualityCommandsbody is a behavioural change: on-quality-level=High, the prior runtime burst fired ~12GEngine->Exec(...)lines after RHI init. Those lines now live in[CarlaQuality_High DeviceProfile]and apply at engine init instead. Functionally equivalent, but the CVar trace showsSetByDeviceProfileinstead ofSetByConsole. Root cause for the move: NVIDIA Xid 109 GR CTX SWITCH TIMEOUT on Blackwell + recent NVIDIA OKM was triggered by the runtime burst itself.bAlwaysPersistRenderingState = falseon the base,trueon the RGB subclass) drops Lumen / TSR history on non-RGB sensors. No visible quality cost is expected (those pipelines do not consume TSR state), but a deployment that adds a new scene-capture subclass and observes flicker can opt back in by settingCapture->bAlwaysPersistRenderingState = truein the subclass ctor, mirroringASceneCaptureCamera.true(opt-out) so default users match upstream behaviour out of the box. Settinguse_ray_tracing=falseon a camera spawn (orcarla.Camera.UseRayTracing 0globally) is the explicit opt-out path; the default behaviour is unchanged fromue5-dev.This change is