Skip to content

[codex] Hash-truncate long cache path segments#2279

Open
hansent wants to merge 5 commits into
mainfrom
codex/hash-truncate-cache-paths
Open

[codex] Hash-truncate long cache path segments#2279
hansent wants to merge 5 commits into
mainfrom
codex/hash-truncate-cache-paths

Conversation

@hansent
Copy link
Copy Markdown
Collaborator

@hansent hansent commented Apr 27, 2026

Summary

  • Add deterministic hash truncation for path segments longer than the Linux NAME_MAX limit.
  • Route Roboflow metadata and model artifact cache paths through the safe join helper.
  • Update direct cache path users and air-gapped cache detection to use the same bounded layout.
  • Add regression coverage for long instant-model slugs.

Why

Serverless deployments were seeing OSError: [Errno 36] File name too long when instant model IDs produced very long cache directory names. The issue is per path segment, so the fix canonicalizes each segment before writing or reading cache files.

Validation

  • .venv/bin/python -m black --check inference/core/utils/file_system.py inference/core/cache/model_artifacts.py inference/core/registries/roboflow.py inference/core/models/roboflow.py inference/models/transformers/transformers.py inference/models/grounding_dino/grounding_dino.py inference/models/easy_ocr/easy_ocr.py inference/core/cache/air_gapped.py tests/inference/unit_tests/core/utils/test_file_system.py tests/inference/unit_tests/core/cache/test_model_artifacts.py tests/inference/unit_tests/core/registries/test_roboflow.py
  • .venv/bin/python -m isort --check-only inference/core/utils/file_system.py inference/core/cache/model_artifacts.py inference/core/registries/roboflow.py inference/core/models/roboflow.py inference/models/transformers/transformers.py inference/models/grounding_dino/grounding_dino.py inference/models/easy_ocr/easy_ocr.py inference/core/cache/air_gapped.py tests/inference/unit_tests/core/utils/test_file_system.py tests/inference/unit_tests/core/cache/test_model_artifacts.py tests/inference/unit_tests/core/registries/test_roboflow.py
  • .venv/bin/pytest tests/inference/unit_tests/core/utils/test_file_system.py tests/inference/unit_tests/core/cache/test_model_artifacts.py tests/inference/unit_tests/core/registries/test_roboflow.py tests/unit/core/cache/test_air_gapped.py

Comment thread inference/core/utils/file_system.py Outdated
def hash_truncate_path_segment(path_segment: str) -> str:
if len(os.fsencode(path_segment)) <= MAX_PATH_SEGMENT_BYTES:
return path_segment
encoded_segment = os.fsencode(path_segment)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double op os.fsencode(path_segment)

Copy link
Copy Markdown
Collaborator

@PawelPeczek-Roboflow PawelPeczek-Roboflow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with necessity of change, but I am not sure if that solves all issues:

  • what if I just have plenty of segments which in total would exceed limit on path size?
  • is it required that all segments could be altered with this function - especially with our cache purging strategy, in some cases (in theory) we could land outside desired directory, no?
  • would the approach from inference_models work? we leave prefix as is and just slugify model_id to be size constraint plus add hash?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants