Skip to content

fix: sort instruction discovery order for deterministic Build IDs across platforms#468

Open
Coolomina wants to merge 1 commit intomicrosoft:mainfrom
Coolomina:fix/build-id-nondeterminism
Open

fix: sort instruction discovery order for deterministic Build IDs across platforms#468
Coolomina wants to merge 1 commit intomicrosoft:mainfrom
Coolomina:fix/build-id-nondeterminism

Conversation

@Coolomina
Copy link

What

Fixes non-deterministic Build ID generation in apm compile when the same project is compiled on macOS vs Linux (closes #467).

Why

os.walk() returns directory entries in filesystem-native order — APFS on macOS uses insertion order, ext4 on Linux uses inode/hash order. Instructions within each ## Files matching pattern group were written in that discovery order without sorting. This caused the final AGENTS.md content to differ between platforms, producing a different SHA-256 Build ID hash even when no instruction content had changed.

Practical consequences:

  • CI (Ubuntu) always detects a dirty AGENTS.md when the file was compiled by a macOS developer
  • apm compile cannot be used reliably in pre-commit hooks

Changes

  • context_optimizer.py — sort dirs and files in _get_all_files() so os.walk() traversal order is consistent across filesystems
  • template_builder.py — sort pattern groups and instructions by file_path in the single-file compilation path (--single-agents)
  • distributed_compiler.py — sort instructions by file_path within each pattern group in the distributed AGENTS.md path
  • claude_formatter.py — same fix for the CLAUDE.md path

Verification

Compiled the same 5-instruction project on macOS and inside a python:3.12-slim Docker container (ext4). Both now produce <!-- Build ID: e316dee8ef7f -->.

Test plan

  • uv run pytest tests/unit tests/test_console.py -x — 3080 passed

…oss platforms

os.walk() returns directory entries in filesystem-native order (APFS on macOS,
ext4 on Linux), causing instructions within each pattern group to be written in
a different sequence per platform. This makes the Build ID hash non-deterministic,
breaking CI checks and preventing apm compile from being used in pre-commit hooks.

Sort dirs and files in _get_all_files() and sort pattern_instructions by file_path
in template_builder, distributed_compiler, and claude_formatter before writing.

Fixes microsoft#467
Copilot AI review requested due to automatic review settings March 26, 2026 16:12
@Coolomina
Copy link
Author

@microsoft-github-policy-service agree

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes non-deterministic Build ID generation by making instruction discovery and rendering order deterministic, so apm compile produces byte-for-byte stable outputs across filesystems/OSes.

Changes:

  • Sort os.walk() directory and file traversal in ContextOptimizer._get_all_files().
  • Sort pattern groups and instructions in the single-file AGENTS.md template output.
  • Sort instructions within pattern groups for distributed AGENTS.md and CLAUDE.md outputs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/apm_cli/compilation/context_optimizer.py Sorts dirs/files during os.walk() to make discovery deterministic.
src/apm_cli/compilation/template_builder.py Sorts pattern groups and instructions when building conditional sections for single-file compilation.
src/apm_cli/compilation/distributed_compiler.py Sorts instructions by file_path within each applyTo group when generating distributed AGENTS.md.
src/apm_cli/compilation/claude_formatter.py Sorts instructions by file_path within each applyTo group when generating CLAUDE.md.


# Combine content from all instructions for this pattern
for instruction in pattern_instructions:
for instruction in sorted(pattern_instructions, key=lambda i: str(i.file_path)):
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instruction sort key uses str(i.file_path), which is OS-dependent (e.g., path separators and drive prefixes differ on Windows vs POSIX). Since this is meant to make Build IDs deterministic across platforms, consider sorting by a normalized POSIX form (e.g., i.file_path.as_posix() or the same portable_relpath(...) representation used for the source comments) so ordering is identical across Windows/macOS/Linux.

Suggested change
for instruction in sorted(pattern_instructions, key=lambda i: str(i.file_path)):
for instruction in sorted(pattern_instructions, key=lambda i: i.file_path.as_posix()):

Copilot uses AI. Check for mistakes.
Comment on lines +37 to 43
for pattern, pattern_instructions in sorted(pattern_groups.items()):
sections.append(f"## Files matching `{pattern}`")
sections.append("")

# Combine content from all instructions for this pattern
for instruction in pattern_instructions:
for instruction in sorted(pattern_instructions, key=lambda i: str(i.file_path)):
content = instruction.content.strip()
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing unit tests for build_conditional_sections, but they don't assert deterministic ordering within a pattern group. Since this change is specifically about fixing non-deterministic output, it would be good to add a regression assertion that instructions are emitted in sorted file_path order (and that pattern groups are emitted in sorted pattern order) given a deliberately shuffled input list.

Copilot uses AI. Check for mistakes.
sections.append("")

for instruction in pattern_instructions:
for instruction in sorted(pattern_instructions, key=lambda i: str(i.file_path)):
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instruction sort key uses str(i.file_path), which can differ across platforms (notably Windows vs POSIX) due to path separator/drive formatting. To fully guarantee deterministic ordering across platforms, sort by a normalized representation (e.g., portable_relpath(instruction.file_path, self.base_dir) or instruction.file_path.as_posix()) instead.

Suggested change
for instruction in sorted(pattern_instructions, key=lambda i: str(i.file_path)):
for instruction in sorted(
pattern_instructions,
key=lambda i: portable_relpath(i.file_path, self.base_dir),
):

Copilot uses AI. Check for mistakes.
sections.append("")

for instruction in pattern_instructions:
for instruction in sorted(pattern_instructions, key=lambda i: str(i.file_path)):
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instruction sort key uses str(i.file_path), which is OS-dependent (Windows vs POSIX path formatting). To ensure instruction ordering (and therefore Build IDs) are deterministic across platforms, sort by a normalized POSIX form such as portable_relpath(instruction.file_path, self.base_dir) or instruction.file_path.as_posix() instead of str(...).

Suggested change
for instruction in sorted(pattern_instructions, key=lambda i: str(i.file_path)):
for instruction in sorted(
pattern_instructions,
key=lambda i: portable_relpath(i.file_path, self.base_dir),
):

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] apm compile produces different Build IDs on macOS vs Linux

3 participants