fix: sort instruction discovery order for deterministic Build IDs across platforms#468
fix: sort instruction discovery order for deterministic Build IDs across platforms#468Coolomina wants to merge 1 commit intomicrosoft:mainfrom
Conversation
…oss platforms os.walk() returns directory entries in filesystem-native order (APFS on macOS, ext4 on Linux), causing instructions within each pattern group to be written in a different sequence per platform. This makes the Build ID hash non-deterministic, breaking CI checks and preventing apm compile from being used in pre-commit hooks. Sort dirs and files in _get_all_files() and sort pattern_instructions by file_path in template_builder, distributed_compiler, and claude_formatter before writing. Fixes microsoft#467
|
@microsoft-github-policy-service agree |
There was a problem hiding this comment.
Pull request overview
Fixes non-deterministic Build ID generation by making instruction discovery and rendering order deterministic, so apm compile produces byte-for-byte stable outputs across filesystems/OSes.
Changes:
- Sort
os.walk()directory and file traversal inContextOptimizer._get_all_files(). - Sort pattern groups and instructions in the single-file
AGENTS.mdtemplate output. - Sort instructions within pattern groups for distributed
AGENTS.mdandCLAUDE.mdoutputs.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/apm_cli/compilation/context_optimizer.py | Sorts dirs/files during os.walk() to make discovery deterministic. |
| src/apm_cli/compilation/template_builder.py | Sorts pattern groups and instructions when building conditional sections for single-file compilation. |
| src/apm_cli/compilation/distributed_compiler.py | Sorts instructions by file_path within each applyTo group when generating distributed AGENTS.md. |
| src/apm_cli/compilation/claude_formatter.py | Sorts instructions by file_path within each applyTo group when generating CLAUDE.md. |
|
|
||
| # Combine content from all instructions for this pattern | ||
| for instruction in pattern_instructions: | ||
| for instruction in sorted(pattern_instructions, key=lambda i: str(i.file_path)): |
There was a problem hiding this comment.
The instruction sort key uses str(i.file_path), which is OS-dependent (e.g., path separators and drive prefixes differ on Windows vs POSIX). Since this is meant to make Build IDs deterministic across platforms, consider sorting by a normalized POSIX form (e.g., i.file_path.as_posix() or the same portable_relpath(...) representation used for the source comments) so ordering is identical across Windows/macOS/Linux.
| for instruction in sorted(pattern_instructions, key=lambda i: str(i.file_path)): | |
| for instruction in sorted(pattern_instructions, key=lambda i: i.file_path.as_posix()): |
| for pattern, pattern_instructions in sorted(pattern_groups.items()): | ||
| sections.append(f"## Files matching `{pattern}`") | ||
| sections.append("") | ||
|
|
||
| # Combine content from all instructions for this pattern | ||
| for instruction in pattern_instructions: | ||
| for instruction in sorted(pattern_instructions, key=lambda i: str(i.file_path)): | ||
| content = instruction.content.strip() |
There was a problem hiding this comment.
There are existing unit tests for build_conditional_sections, but they don't assert deterministic ordering within a pattern group. Since this change is specifically about fixing non-deterministic output, it would be good to add a regression assertion that instructions are emitted in sorted file_path order (and that pattern groups are emitted in sorted pattern order) given a deliberately shuffled input list.
| sections.append("") | ||
|
|
||
| for instruction in pattern_instructions: | ||
| for instruction in sorted(pattern_instructions, key=lambda i: str(i.file_path)): |
There was a problem hiding this comment.
The instruction sort key uses str(i.file_path), which can differ across platforms (notably Windows vs POSIX) due to path separator/drive formatting. To fully guarantee deterministic ordering across platforms, sort by a normalized representation (e.g., portable_relpath(instruction.file_path, self.base_dir) or instruction.file_path.as_posix()) instead.
| for instruction in sorted(pattern_instructions, key=lambda i: str(i.file_path)): | |
| for instruction in sorted( | |
| pattern_instructions, | |
| key=lambda i: portable_relpath(i.file_path, self.base_dir), | |
| ): |
| sections.append("") | ||
|
|
||
| for instruction in pattern_instructions: | ||
| for instruction in sorted(pattern_instructions, key=lambda i: str(i.file_path)): |
There was a problem hiding this comment.
The instruction sort key uses str(i.file_path), which is OS-dependent (Windows vs POSIX path formatting). To ensure instruction ordering (and therefore Build IDs) are deterministic across platforms, sort by a normalized POSIX form such as portable_relpath(instruction.file_path, self.base_dir) or instruction.file_path.as_posix() instead of str(...).
| for instruction in sorted(pattern_instructions, key=lambda i: str(i.file_path)): | |
| for instruction in sorted( | |
| pattern_instructions, | |
| key=lambda i: portable_relpath(i.file_path, self.base_dir), | |
| ): |
What
Fixes non-deterministic
Build IDgeneration inapm compilewhen the same project is compiled on macOS vs Linux (closes #467).Why
os.walk()returns directory entries in filesystem-native order — APFS on macOS uses insertion order, ext4 on Linux uses inode/hash order. Instructions within each## Files matchingpattern group were written in that discovery order without sorting. This caused the finalAGENTS.mdcontent to differ between platforms, producing a different SHA-256 Build ID hash even when no instruction content had changed.Practical consequences:
AGENTS.mdwhen the file was compiled by a macOS developerapm compilecannot be used reliably in pre-commit hooksChanges
context_optimizer.py— sortdirsandfilesin_get_all_files()soos.walk()traversal order is consistent across filesystemstemplate_builder.py— sort pattern groups and instructions byfile_pathin the single-file compilation path (--single-agents)distributed_compiler.py— sort instructions byfile_pathwithin each pattern group in the distributed AGENTS.md pathclaude_formatter.py— same fix for the CLAUDE.md pathVerification
Compiled the same 5-instruction project on macOS and inside a
python:3.12-slimDocker container (ext4). Both now produce<!-- Build ID: e316dee8ef7f -->.Test plan
uv run pytest tests/unit tests/test_console.py -x— 3080 passed