fix: detect and log WASM guard traps; mark module failed after panic#2698
fix: detect and log WASM guard traps; mark module failed after panic#2698
Conversation
…t W-2 Agent-Logs-Url: https://github.com/github/gh-aw-mcpg/sessions/420f2df5-ad00-457a-869c-8f0e41731e6f Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR improves the MCP gateway’s WASM-based guard reliability by detecting WASM execution traps (e.g., Rust panics), logging them at the normal error level, and preventing further use of a potentially corrupted module state.
Changes:
- Track a “failed” state on
WasmGuardafter a detected WASM trap and short-circuit subsequent calls with an error referencing the original trap. - Add
isWasmTrap(err)to identify trap-style failures via thewasm error:prefix and log traps vialogger.LogError. - Add unit tests for trap-string detection and for the post-failure short-circuit behavior; update the integrity audit workflow to grep for these indicators.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
internal/guard/wasm.go |
Detects/logs WASM traps and marks a guard instance permanently failed after a trap to prevent state corruption. |
internal/guard/wasm_test.go |
Adds tests for trap detection and for the guard’s “failed state” short-circuit behavior. |
.github/workflows/integrity-filtering-audit.md |
Updates audit instructions/grep patterns to surface WASM traps/panics in routine log review. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // policy context stored by label_agent), so all subsequent calls are | ||
| // unsafe until the guard is reloaded. | ||
| if g.failed { | ||
| return nil, fmt.Errorf("WASM guard '%s' is unavailable after a previous trap: %w", g.name, g.failedErr) |
There was a problem hiding this comment.
In the g.failed early-return, fmt.Errorf(... %w, g.failedErr) can accidentally return a nil error if g.failedErr is ever nil (Go's fmt.Errorf returns nil when wrapping a nil error). That would make callWasmFunction return (nil, nil) even though the guard is failed. Consider ensuring failedErr is always non-nil when failed is true, or avoid %w when failedErr == nil (e.g., use a sentinel error).
| return nil, fmt.Errorf("WASM guard '%s' is unavailable after a previous trap: %w", g.name, g.failedErr) | |
| if g.failedErr != nil { | |
| return nil, fmt.Errorf("WASM guard '%s' is unavailable after a previous trap: %w", g.name, g.failedErr) | |
| } | |
| return nil, fmt.Errorf("WASM guard '%s' is unavailable after a previous trap", g.name) |
| ctx := context.Background() | ||
| _, err := g.callWasmFunction(ctx, "label_response", []byte(`{}`)) | ||
| require.Error(t, err) |
There was a problem hiding this comment.
callWasmFunction now documents the precondition that g.mu must be held by the caller, but these tests call it directly without taking the mutex. To keep tests aligned with the contract (and avoid future races if the test expands), consider locking g.mu around the call or switching the test to exercise the behavior via LabelAgent/LabelResponse which already hold the lock.
| t.Run("failed guard returns error immediately for callWasmFunction", func(t *testing.T) { | ||
| // Build a minimal valid WasmGuard by hand to exercise the failed-state path | ||
| // without needing a full WASM binary. | ||
| originalErr := errors.New("WASM function call failed: wasm error: unreachable") | ||
| g := &WasmGuard{ |
There was a problem hiding this comment.
The new trap-handling behavior in callWasmFunction (detect trap, log, set g.failed=true, persist failedErr) isn't directly exercised: the tests only validate isWasmTrap and the pre-set failed-state short-circuit. Consider adding an end-to-end test using a tiny WASM module whose exported function executes unreachable so you can assert that a real trap flips failed and that subsequent calls immediately fail with an error that unwraps to the original trap.
The integrity audit (W-2) observed a
wasm error: unreachable(Rust guard panic) that was only visible in logs underDEBUG=*, and after the trap the guard's internal state (global policy context set bylabel_agent) could be silently corrupted for subsequent calls.Changes
internal/guard/wasm.gofailed/failedErrfields toWasmGuard(accessed only whilemuis held) to track post-trap stateisWasmTrap(err error) bool— detects errors containingwasm error:(the prefix wazero/Rust produces on any trap)callWasmFunctionnow:failedat entry and returns immediately with a wrapped error referencing the original traplogger.LogError("backend", ...)(visible inmcp-gateway.logwithoutDEBUG=*), setsfailed = trueinternal/guard/wasm_test.goTestIsWasmTrap: nil, generic, and various trap error stringsTestWasmGuardFailedState: module refuses further calls once failed; original error is reachable viaerrors.Is.github/workflows/integrity-filtering-audit.mdwasm error:/WASM guard trapto Step 3 grep patternsWarning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
example.com/tmp/go-build2016825021/b334/launcher.test /tmp/go-build2016825021/b334/launcher.test -test.testlogfile=/tmp/go-build2016825021/b334/testlog.txt -test.paniconexit0 -test.timeout=10m0s /tmp/go-build2016825021/b297/vet.cfg ache/go/1.25.8/x-c=4 ache/go/1.25.8/x-nolocalimports x_amd64/vet .rs ssions.rs s hUvueDyGfjP- go_.�� 64/src/net ternal/boring/bbig/big.go 64/pkg/tool/linux_amd64/compile -I /tmp/go-build389-atomic -I 64/pkg/tool/linu-buildtags(dns block)invalid-host-that-does-not-exist-12345.com/tmp/go-build2016825021/b319/config.test /tmp/go-build2016825021/b319/config.test -test.testlogfile=/tmp/go-build2016825021/b319/testlog.txt -test.paniconexit0 -test.timeout=10m0s --no�� Xs_UNqwZq /home/REDACTED/work/gh-aw-mcpg/gh--c=4 x_amd64/vet(dns block)nonexistent.local/tmp/go-build2016825021/b334/launcher.test /tmp/go-build2016825021/b334/launcher.test -test.testlogfile=/tmp/go-build2016825021/b334/testlog.txt -test.paniconexit0 -test.timeout=10m0s /tmp/go-build2016825021/b297/vet.cfg ache/go/1.25.8/x-c=4 ache/go/1.25.8/x-nolocalimports x_amd64/vet .rs ssions.rs s hUvueDyGfjP- go_.�� 64/src/net ternal/boring/bbig/big.go 64/pkg/tool/linux_amd64/compile -I /tmp/go-build389-atomic -I 64/pkg/tool/linu-buildtags(dns block)slow.example.com/tmp/go-build2016825021/b334/launcher.test /tmp/go-build2016825021/b334/launcher.test -test.testlogfile=/tmp/go-build2016825021/b334/testlog.txt -test.paniconexit0 -test.timeout=10m0s /tmp/go-build2016825021/b297/vet.cfg ache/go/1.25.8/x-c=4 ache/go/1.25.8/x-nolocalimports x_amd64/vet .rs ssions.rs s hUvueDyGfjP- go_.�� 64/src/net ternal/boring/bbig/big.go 64/pkg/tool/linux_amd64/compile -I /tmp/go-build389-atomic -I 64/pkg/tool/linu-buildtags(dns block)this-host-does-not-exist-12345.com/tmp/go-build2016825021/b343/mcp.test /tmp/go-build2016825021/b343/mcp.test -test.testlogfile=/tmp/go-build2016825021/b343/testlog.txt -test.paniconexit0 -test.timeout=10m0s -I /opt/hostedtoolcache/go/1.25.8/x-c=4 cfg x_amd64/vet --gdwarf-5 --64(dns block)If you need me to access, download, or install something from one of these locations, you can either:
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.