Skip to content

perf(dso): release bpfAllocationMutex during binaryParser.Parse — eliminate serialization#100

Open
KorsarOfficial wants to merge 1 commit intoyandex:mainfrom
KorsarOfficial:perf/parse-outside-lock
Open

perf(dso): release bpfAllocationMutex during binaryParser.Parse — eliminate serialization#100
KorsarOfficial wants to merge 1 commit intoyandex:mainfrom
KorsarOfficial:perf/parse-outside-lock

Conversation

@KorsarOfficial
Copy link
Copy Markdown

Closes #94

Problem

populateDSO holds bpfAllocationMutex for the entire function body, including the expensive binaryParser.Parse(ctx, f) call. When multiple goroutines load different DSOs concurrently, they serialize on this lock even though each parses an independent ELF file.

Concurrency analysis

Let k = concurrent DSO loads, T_parse = ELF parse latency.

Before After
Lock hold time T_fast + T_parse + T_store T_fast + T_store
Concurrent Parse throughput 1 (serialized) k (parallel)
Total wall time for k DSOs k · T_parse max(T_parse) + O(T_store)

For k = 8 concurrent loads, T_parse ≈ 50 ms each:

ΔT ≈ 8 × 50ms − 50ms = 350 ms saved

Implementation

Split into two critical sections with an unlocked Parse gap:

Section 1 (fast path):
    Lock()
    if bpfAllocation exists → MoveFromCache → Unlock, return
    if stale → Release, nil out
    Unlock()

    ← Parse runs here, no lock held →

Section 2 (store result):
    Lock(); defer Unlock()
    if bpfAllocation != nil → return  // double-check guard
    dso.BinaryClass = binaryClass     // set under lock
    bpfAllocation = bpfBinaryManager.Add(...)

Key correctness details:

  • BinaryClass is computed as a local variable during the unlocked gap, then written to dso.BinaryClass only under the lock — no data race.
  • Double-check guard prevents duplicate bpfBinaryManager.Add when two goroutines race through Parse.
  • MoveFromCache / Release remain under exclusive lock (they mutate BPF manager state).
  • sync.Mutex retained (not RWMutex — no read-lock callers exist).

@KorsarOfficial
Copy link
Copy Markdown
Author

📄 Full analysis report (PDF): 08-perforator-optimizations.pdf

Covers complexity analysis, concurrency audit, and verification for all 7 optimizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(agent): release bpfAllocationMutex before expensive binaryParser.Parse

1 participant