Skip to content

Fix graph orphans for omitted event types#3038

Open
liquidsec wants to merge 2 commits into3.0from
fix-graph-orphan
Open

Fix graph orphans for omitted event types#3038
liquidsec wants to merge 2 commits into3.0from
fix-graph-orphan

Conversation

@liquidsec
Copy link
Copy Markdown
Contributor

Summary

Closes #1151

_is_graph_important() was rejecting events with _omit=True even when they were marked as _graph_important=True. This caused graph orphans in JSON/neo4j output when an omitted event type (e.g. a JS URL_UNVERIFIED) was the parent of a non-omitted event (e.g. DNS_NAME).

The _omit check in _is_graph_important() was redundant — output modules without _preserve_graph (stdout, txt, csv) already return False from _is_graph_important() because self.preserve_graph is False. Only modules with _preserve_graph=True (json, neo4j) use graph importance, and those are exactly the modules that need to accept omitted parents to prevent orphans.

Fix

Remove and not getattr(event, "_omit", False) from _is_graph_important().

Test

Added test verifying that a _preserve_graph output module accepts graph-important omitted events, while a non-_preserve_graph module still rejects them.

@github-actions
Copy link
Copy Markdown
Contributor

📊 Performance Benchmark Report

Comparing 3.0 (baseline) vs fix-graph-orphan (current)

📈 Detailed Results (All Benchmarks)

📋 Complete results for all benchmarks - includes both significant and insignificant changes

🧪 Test Name 📏 Base 📏 Current 📈 Change 🎯 Status
Bloom Filter Dns Mutation Tracking Performance 4.15ms 4.14ms -0.3%
Bloom Filter Large Scale Dns Brute Force 17.39ms 17.21ms -1.0%
Large Closest Match Lookup 357.18ms 358.66ms +0.4%
Realistic Closest Match Workload 188.25ms 189.45ms +0.6%
Event Memory Medium Scan 1783 B/event 1784 B/event +0.1%
Event Memory Large Scan 1768 B/event 1768 B/event -0.0%
Event Validation Full Scan Startup Small Batch 422.74ms 408.67ms -3.3%
Event Validation Full Scan Startup Large Batch 585.50ms 581.18ms -0.7%
Make Event Autodetection Small 31.13ms 30.43ms -2.2%
Make Event Autodetection Large 316.66ms 310.13ms -2.1%
Make Event Explicit Types 13.89ms 13.59ms -2.2%
Excavate Single Thread Small 3.932s 3.912s -0.5%
Excavate Single Thread Large 9.619s 9.378s -2.5%
Excavate Parallel Tasks Small 4.171s 4.071s -2.4%
Excavate Parallel Tasks Large 7.265s 7.178s -1.2%
Is Ip Performance 3.32ms 3.19ms -3.9%
Make Ip Type Performance 11.84ms 11.40ms -3.7%
Mixed Ip Operations 4.66ms 4.47ms -4.0%
Memory Use Web Crawl 66.8 MB 48.4 MB -27.5% 🟢🟢🟢 🚀
Memory Use Subdomain Enum 19.3 MB 19.3 MB +0.0%
Scan Throughput 100 7.380s 8.044s +9.0%
Scan Throughput 1000 42.082s 43.076s +2.4%
Typical Queue Shuffle 65.01µs 63.42µs -2.5%
Priority Queue Shuffle 724.96µs 733.79µs +1.2%

🎯 Performance Summary

+ 1 improvement 🚀
  23 unchanged ✅

🔍 Significant Changes (>10%)

  • Memory Use Web Crawl: 27.5% 🚀 less memory

🐍 Python Version 3.11.15

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91%. Comparing base (677f7c0) to head (5f8c8c3).

Additional details and impacted files
@@          Coverage Diff          @@
##             3.0   #3038   +/-   ##
=====================================
- Coverage     91%     91%   -0%     
=====================================
  Files        440     440           
  Lines      37550   37563   +13     
=====================================
+ Hits       33987   33992    +5     
- Misses      3563    3571    +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TheTechromancer
Copy link
Copy Markdown
Collaborator

_is_graph_important() was rejecting events with _omit=True even when they were marked as _graph_important=True

This was intended functionality. graph importance is meant to override internal events, but not omitted ones. omitted ones should always be excluded from output, and that should never create orphans because the omitted event is skipped, preserving the parent chain.

The mystery is why JS URLs specifically seem to be slipping through this system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants