Source location pipeline and debug info emission#1444
Closed
micahscopes wants to merge 23 commits into
Closed
Conversation
Add gimli-based DWARF emitter that generates .debug_line, .debug_info, and .debug_abbrev sections from compilation provenance data. The emitter consumes ResolvedProvenanceEntry records (PC range + file:line:col) and produces standard DWARF v5 sections as byte vectors. Also convert provenance format from byte offsets to line:col by resolving against file text during MIR lowering.
Strengthen provenance test to validate file:line:col-line:col format structure (numeric line/col values, dash separator). Add coverage test asserting >30% of PcMapEntry records have source attribution.
Implement EthdebugBuilder that generates ethdebug/format-compatible JSON with compilation metadata, source files, and per-instruction source annotations. Consumes provenance data from ObjectArtifact observability to map bytecode offsets to source file:line:col ranges. Supports Info schema (compilation + programs array) and Program schema (contract name, environment, instruction-level source mapping).
SourceOrd(0) was both the default (no source) and the first valid ordinal, causing ambiguity. Use u32::MAX as the sentinel value so ordinal 0 can be a valid source table index.
Generate Datalog-style facts from compilation provenance: - bytecode_range(object, section, pc_start, pc_end, func, block) - ir_inst_at(func, inst, pc_start, pc_end) - source_at(func, inst, file, start_line, start_col, end_line, end_col) - pc_source(pc_start, pc_end, file, line, col) - unmapped(pc_start, pc_end, reason)
Expose per-function source tables via a dedicated salsa-tracked query, following the SCIP/doc gen pattern. Consumers (LSP, web debugger, agent tooling) can read source tables without triggering full compilation.
Add FeTypeDesc enum covering Fe's type system (UInt, Int, Bool, Address, Bytes, String, Struct, Enum, Array) and generate corresponding DWARF DW_TAG_base_type and DW_TAG_structure_type DIEs with proper encoding attributes.
Add to_elf() method on DwarfDebugInfo that wraps debug sections in a standard ELF object file. The ELF contains only .debug_info, .debug_abbrev, .debug_line, etc. — no code sections. Readable by llvm-dwarfdump, readelf, and other standard DWARF tools.
Add fe_ty_to_type_desc() that converts Fe's TyId → FeTypeDesc by walking TyData/TyBase/PrimTy/AdtDef. Handles primitive types (UInt, Int, Bool, Address, String), struct types (with field enumeration via AdtDef fields), enum types (variant listing), and tuples.
Add FeFuncDesc + add_subprogram_dies() for generating DW_TAG_subprogram entries with function names, parameters, and declaration lines. Add provenance_works_with_contract_bytecode test that compiles a real Fe contract (simple_contract.fe) through the full Sonatina pipeline, verifying the provenance-enabled compilation produces valid bytecode.
Wire Fe's type system into ethdebug JSON output: add_type_from_desc() converts FeTypeDesc to ethdebug type schema entries with kind, bits, and contains fields. Supports uint, int, bool, address, string, bytes, struct (with typed fields), enum (with variants), and array types.
Compile a multi-function Fe program, extract provenance from observability output, verify that executable source lines are attributed to bytecode ranges. Tests the full pipeline from source through HIR/MIR/Sonatina to bytecode attribution.
Add EthdebugPointer (storage/memory/stack/calldata locations with slot/offset/length), EthdebugVariable (name + type + pointer), and per-instruction variable context emission in the Program schema.
Add DW_TAG_variable DIEs with DW_AT_location expressions using DW_OP_constu for storage slots, memory offsets, and stack depths. FeVarLocation describes where a variable lives at runtime.
Persist MIR→Sonatina IR instruction mappings (MirToIrEntry) alongside the FrontendProvenanceMap. The datalog emitter now produces complete cross-level facts: - mir_source(func, ord, file, start_line, start_col, end_line, end_col) - mir_to_ir(func, ir_inst, mir_block, mir_stmt) - ir_source_ord(func, ir_inst, source_ord) - ir_to_pc(func, ir_inst, pc_start, pc_end) - source_at(func, ir_inst, file, line, col, end_line, end_col) - pc_source(pc_start, pc_end, file, line, col) - bytecode_range(object, section, pc_start, pc_end, func, block) - unmapped(pc_start, pc_end, reason) Any consumer can now traverse the full chain: EVM PC → Sonatina IR InstId → MIR (block, stmt) → source file:line:col
…observability Add SourceOrd(u32) and FunctionSourceTable to carry source locations through the compilation pipeline. During MIR lowering, resolve SemOrigin (ExprId/StmtId) via BodySourceMap + LazySpan to concrete file:line:col positions. During Sonatina lowering, track which IR instructions each MIR statement produces and build a FrontendProvenanceMap mapping (FuncRef, InstId) to source location strings. The provenance map is applied to ObjectArtifact observability output, populating PcMapEntry.frontend_provenance with file:line:col-line:col format strings. Coverage is ~87% of code bytes on test fixtures.
5642816 to
6df6368
Compare
Web component renders 4-panel debug view (Fe source, MIR, Sonatina IR, EVM bytecode) with CSS-driven cross-representation hover highlighting. Hovering over any region highlights corresponding regions in all panels. Follows the same pattern as <fe-code-block> + SCIP store: structured JSON payload provides cross-level group mappings, component annotates spans with shared CSS classes, hover applies highlights globally.
Two panels (three on wide screens), each with a dropdown to select representation. Backed by a multi-representation SCIP index where each "file" is a compilation level (source, MIR, Sonatina IR, optimized IR, bytecode). Shared symbol strings across files enable cross-representation hover highlighting via the same djb2 hashing mechanism used by fe-code-block + SCIP store. Symbol format: funcname$ord:N connects the same SourceOrd across all representation levels.
Example in fe-web/examples/debug-explorer.html: drag-and-drop or URL param to load a multi-rep SCIP index JSON. Two configurable panels (three on wide screens) with dropdown selectors for representation. End-to-end test compiles a Fe program through the full pipeline, dumps all IR levels, generates the SCIP index, and writes the explorer files to /tmp/fe-debug-explorer/ for manual browser inspection.
The datalog emitter now accepts the Sonatina Module and emits full instruction-level facts: - inst_type(func, inst, type) — semantic classification (external_call, sstore, sload, call, branch, add, keccak, log, context, etc.) - cfg_edge(func, from_block, to_block) — control flow graph - block_entry/block_exit(func, block, inst) — block boundaries - inst_result(func, inst, idx, value) — instruction outputs - inst_operand(func, inst, idx, value) — instruction inputs - value_def(func, value, inst) — SSA value definitions These facts plus the existing cross-level mappings form a complete queryable program model. Reentrancy detection, gas attribution, dead code analysis, taint tracking — all expressible as Datalog queries over these facts.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves HIR spans during MIR lowering, carries them through Sonatina codegen, populates
PcMapEntry.frontend_provenance. DWARF, ethdebug, and datalog emitters. Salsa-cached source tables. Cross-level MIR <> IR <> bytecode mappings.