Fix race in tunnel waitGroup causing negative counter panic#586
Merged
Conversation
Add() bumped its atomic counter before calling inner.Add(), so a concurrent DoneAll() could observe n>0 and call inner.Done() before inner.Add() ran, panicking with "sync: negative WaitGroup counter". Serialize Add/Done/DoneAll under a mutex; Wait stays outside to avoid deadlock. Fixes #585 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a concurrency race in the tunnel’s custom waitGroup wrapper that could panic with sync: negative WaitGroup counter under concurrent Add() and DoneAll() usage (as reported in #585).
Changes:
- Replace the lock-free atomic counter with a
sync.Mutex-protected counter to makeAdd/Done/DoneAllobserve consistent state. - Adjust
Done/DoneAllto update the counter and underlyingsync.WaitGroupwhile holding the same mutex. - Add tests covering idempotent
Done,DoneAlldrain behavior, and a concurrent Add/DoneAll stress scenario.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
share/tunnel/wg.go |
Makes the wrapper’s counter updates and underlying sync.WaitGroup operations mutually consistent via a mutex, eliminating the observed race window. |
share/tunnel/wg_test.go |
Adds regression/stress tests to exercise the previously failing concurrent behavior and validate wrapper semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+13
to
+15
| w.n += n | ||
| w.inner.Add(n) | ||
| w.mu.Unlock() |
Comment on lines
+28
to
+33
| w.mu.Lock() | ||
| for w.n > 0 { | ||
| w.n-- | ||
| w.inner.Done() | ||
| } | ||
| w.mu.Unlock() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
atomic.Int32counter inshare/tunnel/wg.gowith async.MutexsoAdd/Done/DoneAllobserve consistent stateAdd()previously bumped its atomic counter before callinginner.Add(). A concurrentDoneAll()(fromBindSSH's ctx-cancel goroutine) could observen>0and callinner.Done()in that window, panicking withsync: negative WaitGroup counter— exactly the stack trace in panic: sync: negative WaitGroup counter #585Wait()stays outside the mutex to avoid deadlock against blocked waitersFixes #585
Test plan
go test -race ./...passesshare/tunnel/wg_test.gocovering: idempotentDone,DoneAlldrain, and concurrent Add/DoneAll stress under-race🤖 Generated with Claude Code