[CI Testing Only] Capture offloader.exe crash dumps via WER (Debug build)#1199
Draft
alsepkow wants to merge 10 commits into
Draft
[CI Testing Only] Capture offloader.exe crash dumps via WER (Debug build)#1199alsepkow wants to merge 10 commits into
alsepkow wants to merge 10 commits into
Conversation
Both AMD and NVIDIA DirectX configurations have been stable and have higher pass rates than the existing Tier 1 Intel target. Promote them to Tier 1 so they run on every PR. Qualcomm and the Vulkan IHV configurations remain experimental and continue to require the 'test-all' label. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match the tier change in docs/CI.md and pr-matrix.yaml so the README status table reflects that these targets now run on every PR. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per Bob's review feedback, switch from listing AMD/NVIDIA D3D12 combinations via 'include' to a cross-product with 'exclude' for the AMD/NVIDIA Vulkan combinations. As future targets get promoted out of experimental, we can simply remove exclusions rather than adding inclusions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Apply the same cross-product + exclude pattern to the experimental Exec-Tests-Extra job for consistency. As targets are promoted out of experimental, exclusions can be added here in lockstep with their removal from the Tier 1 Exec-Tests-Windows job. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This reverts commit 1eec3eb. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This change is for the draft AMD-testing PR only and should NOT be merged.
Strips the matrix down to only windows-amd x {check-hlsl-d3d12, check-hlsl-clang-d3d12}
so we can quickly iterate on AMD D3D12 stability investigation without spending
CI on Intel/NVIDIA/MacOS/WARP/Vulkan jobs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Companion to the RelWithDebInfo testing draft PR. This branch runs the same windows-amd D3D12 jobs but with BuildType=Debug to confirm whether the previously observed Debug-only failures still reproduce. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds Windows-only steps around 'Run HLSL Tests' to:
1. Configure HKLM WER LocalDumps for offloader.exe (full memory dumps
to C:\CrashDumps, max 20 retained)
2. Copy any captured dumps into llvm-project/build/test-results/
CrashDumps so they live alongside test output
3. Upload them as a per-run GitHub artifact (14 day retention)
4. Clean up the registry key and dump folder after upload, so the
runner is left in its original state regardless of run outcome
Goal: capture full-memory crash dumps of the AMD amdxc64.dll PSO
compilation crashes for AMD's debug analysis. The offloader's existing
SEH stack-trace printout shows offsets only; full dumps give AMD
register state, thread context, and complete module list.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The self-hosted HLSLPC-AMD01 runner only has Windows PowerShell (powershell.exe) installed, not PowerShell 7+ (pwsh). All three WER-related steps were failing immediately with 'pwsh: command not found', which short-circuited the job before tests ran. Switch shell: pwsh -> shell: powershell to match the convention used by the existing dxdiag step in the same workflow. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous cleanup step ran with -ErrorAction SilentlyContinue and no
verification, which meant a partial failure could silently leave the
HLSLPC-AMD01 machine with the LocalDumps registry key or the
C:\CrashDumps folder still present after the job.
Improvements:
* Configure step now also scrubs any stale state at the start (in
case a prior aborted run left the regkey or folder behind).
* Cleanup step uses try/catch around each removal so one failure
cannot skip subsequent cleanup operations.
* Cleanup step verifies after the fact that both the registry key
and the dump folder are gone, and emits an ##[warning] if not so
we can spot lingering state in the run summary.
Goal: this PR must never permanently reconfigure the AMD test machine,
regardless of whether tests pass, fail, crash, or are cancelled.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[CI testing only — do not merge]
Adds Windows-only crash dump capture for
offloader.exeso we can attach full memory dumps to the AMDamdxc64.dllPSO crash bug report.What this changes
Adds 4 Windows-only steps around
Run HLSL Testsinbuild-and-test-callable.yaml:offloader.exe→C:\CrashDumps(full memory dump,DumpType=2, max 20 retained)llvm-project/build/test-results/CrashDumps/so they sit alongside other test outputcrash-dumps-<run_id>-<attempt>-<sku>-<target>(14 day retention)C:\CrashDumpsfolder after upload — runs withif: always()so the runner is left in its original state regardless of outcomeWhy
The offloader's existing SEH stack-trace printout (LLVM's
PrintStackTrace) only gives module-relative offsets. AMD needs register state, thread context, faulting-thread call stack with their internal symbols, and the complete module list — which require a full memory dump.This PR is based on
pr-1187-testing-debug(Debug build, debug layer ON, parallel) — the configuration that has produced the most reliable crashes.Branch
alsepkow/offload-test-suite:pr-1187-testing-debug-dumpsWhat you should see if a crash happens
.dmpfile (~few hundred MB) appears as a downloadable artifact on the GitHub Actions run pageNotes
HKLMreg writes — assumes the self-hosted runner has admin rights (typical forHLSLPC-AMD01)if: always()so even on test failure / cancellation, the registry key is removed