Skip to content

Source location pipeline and debug info emission#1444

Closed
micahscopes wants to merge 23 commits into
argotorg:masterfrom
micahscopes:debug-info-mvp
Closed

Source location pipeline and debug info emission#1444
micahscopes wants to merge 23 commits into
argotorg:masterfrom
micahscopes:debug-info-mvp

Conversation

@micahscopes
Copy link
Copy Markdown
Collaborator

@micahscopes micahscopes commented May 15, 2026

Resolves HIR spans during MIR lowering, carries them through Sonatina codegen, populates PcMapEntry.frontend_provenance. DWARF, ethdebug, and datalog emitters. Salsa-cached source tables. Cross-level MIR <> IR <> bytecode mappings.

Add gimli-based DWARF emitter that generates .debug_line, .debug_info,
and .debug_abbrev sections from compilation provenance data. The emitter
consumes ResolvedProvenanceEntry records (PC range + file:line:col) and
produces standard DWARF v5 sections as byte vectors.

Also convert provenance format from byte offsets to line:col by resolving
against file text during MIR lowering.
Strengthen provenance test to validate file:line:col-line:col format
structure (numeric line/col values, dash separator). Add coverage test
asserting >30% of PcMapEntry records have source attribution.
Implement EthdebugBuilder that generates ethdebug/format-compatible JSON
with compilation metadata, source files, and per-instruction source
annotations. Consumes provenance data from ObjectArtifact observability
to map bytecode offsets to source file:line:col ranges.

Supports Info schema (compilation + programs array) and Program schema
(contract name, environment, instruction-level source mapping).
SourceOrd(0) was both the default (no source) and the first valid
ordinal, causing ambiguity. Use u32::MAX as the sentinel value so
ordinal 0 can be a valid source table index.
Generate Datalog-style facts from compilation provenance:
- bytecode_range(object, section, pc_start, pc_end, func, block)
- ir_inst_at(func, inst, pc_start, pc_end)
- source_at(func, inst, file, start_line, start_col, end_line, end_col)
- pc_source(pc_start, pc_end, file, line, col)
- unmapped(pc_start, pc_end, reason)
Expose per-function source tables via a dedicated salsa-tracked query,
following the SCIP/doc gen pattern. Consumers (LSP, web debugger, agent
tooling) can read source tables without triggering full compilation.
Add FeTypeDesc enum covering Fe's type system (UInt, Int, Bool, Address,
Bytes, String, Struct, Enum, Array) and generate corresponding DWARF
DW_TAG_base_type and DW_TAG_structure_type DIEs with proper encoding
attributes.
Add to_elf() method on DwarfDebugInfo that wraps debug sections in a
standard ELF object file. The ELF contains only .debug_info,
.debug_abbrev, .debug_line, etc. — no code sections. Readable by
llvm-dwarfdump, readelf, and other standard DWARF tools.
Add fe_ty_to_type_desc() that converts Fe's TyId → FeTypeDesc by
walking TyData/TyBase/PrimTy/AdtDef. Handles primitive types (UInt,
Int, Bool, Address, String), struct types (with field enumeration
via AdtDef fields), enum types (variant listing), and tuples.
Add FeFuncDesc + add_subprogram_dies() for generating DW_TAG_subprogram
entries with function names, parameters, and declaration lines.

Add provenance_works_with_contract_bytecode test that compiles a real
Fe contract (simple_contract.fe) through the full Sonatina pipeline,
verifying the provenance-enabled compilation produces valid bytecode.
Wire Fe's type system into ethdebug JSON output: add_type_from_desc()
converts FeTypeDesc to ethdebug type schema entries with kind, bits,
and contains fields. Supports uint, int, bool, address, string, bytes,
struct (with typed fields), enum (with variants), and array types.
Compile a multi-function Fe program, extract provenance from
observability output, verify that executable source lines are
attributed to bytecode ranges. Tests the full pipeline from
source through HIR/MIR/Sonatina to bytecode attribution.
Add EthdebugPointer (storage/memory/stack/calldata locations with
slot/offset/length), EthdebugVariable (name + type + pointer), and
per-instruction variable context emission in the Program schema.
Add DW_TAG_variable DIEs with DW_AT_location expressions using
DW_OP_constu for storage slots, memory offsets, and stack depths.
FeVarLocation describes where a variable lives at runtime.
Persist MIR→Sonatina IR instruction mappings (MirToIrEntry) alongside
the FrontendProvenanceMap. The datalog emitter now produces complete
cross-level facts:

- mir_source(func, ord, file, start_line, start_col, end_line, end_col)
- mir_to_ir(func, ir_inst, mir_block, mir_stmt)
- ir_source_ord(func, ir_inst, source_ord)
- ir_to_pc(func, ir_inst, pc_start, pc_end)
- source_at(func, ir_inst, file, line, col, end_line, end_col)
- pc_source(pc_start, pc_end, file, line, col)
- bytecode_range(object, section, pc_start, pc_end, func, block)
- unmapped(pc_start, pc_end, reason)

Any consumer can now traverse the full chain:
  EVM PC → Sonatina IR InstId → MIR (block, stmt) → source file:line:col
…observability

Add SourceOrd(u32) and FunctionSourceTable to carry source locations
through the compilation pipeline. During MIR lowering, resolve SemOrigin
(ExprId/StmtId) via BodySourceMap + LazySpan to concrete file:line:col
positions. During Sonatina lowering, track which IR instructions each MIR
statement produces and build a FrontendProvenanceMap mapping (FuncRef,
InstId) to source location strings.

The provenance map is applied to ObjectArtifact observability output,
populating PcMapEntry.frontend_provenance with file:line:col-line:col
format strings. Coverage is ~87% of code bytes on test fixtures.
Web component renders 4-panel debug view (Fe source, MIR, Sonatina IR,
EVM bytecode) with CSS-driven cross-representation hover highlighting.
Hovering over any region highlights corresponding regions in all panels.

Follows the same pattern as <fe-code-block> + SCIP store: structured
JSON payload provides cross-level group mappings, component annotates
spans with shared CSS classes, hover applies highlights globally.
Two panels (three on wide screens), each with a dropdown to select
representation. Backed by a multi-representation SCIP index where
each "file" is a compilation level (source, MIR, Sonatina IR,
optimized IR, bytecode). Shared symbol strings across files enable
cross-representation hover highlighting via the same djb2 hashing
mechanism used by fe-code-block + SCIP store.

Symbol format: funcname$ord:N connects the same SourceOrd across
all representation levels.
Example in fe-web/examples/debug-explorer.html: drag-and-drop or URL
param to load a multi-rep SCIP index JSON. Two configurable panels
(three on wide screens) with dropdown selectors for representation.

End-to-end test compiles a Fe program through the full pipeline, dumps
all IR levels, generates the SCIP index, and writes the explorer files
to /tmp/fe-debug-explorer/ for manual browser inspection.
The datalog emitter now accepts the Sonatina Module and emits full
instruction-level facts:

- inst_type(func, inst, type) — semantic classification (external_call,
  sstore, sload, call, branch, add, keccak, log, context, etc.)
- cfg_edge(func, from_block, to_block) — control flow graph
- block_entry/block_exit(func, block, inst) — block boundaries
- inst_result(func, inst, idx, value) — instruction outputs
- inst_operand(func, inst, idx, value) — instruction inputs
- value_def(func, value, inst) — SSA value definitions

These facts plus the existing cross-level mappings form a complete
queryable program model. Reentrancy detection, gas attribution, dead
code analysis, taint tracking — all expressible as Datalog queries
over these facts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant